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5 SYSTEM FOR IN VITRO TRANSPOSITION USING MODIFIED TN5 TRANSPOSASE 

CROSS - REFERENCE TO RELATED APPLICATION 
This patent application is a continuation-in-part of a 
patent application entitled "System for In Vitro 
Transposition," filed March 11, 1997, for which no serial 
10 number has yet been accorded. Applicants have petitioned for a 

filing date of September 9, 1996 to be accorded to the parent 
application . 

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT 
Not applicable. 

15 BACKGROUND OF THE INVENTION 

The present invention relates generally to the field of 
transposable nucleic acid and, more particularly to production 
and use of a modified transposase enzyme in a system for 
introducing genetic changes to nucleic acid. 
2o Transposable genetic elements are DNA sequences, found in 

a wide variety of prokaryotic and eukaryotic organisms, that 
can move or transpose from one position to another position in 
a genome. Jn vivo, intra -chromosomal transpositions as well as 
transpositions between chromosomal and non- chromosomal genetic 
25 material are known. In several systems, transposition is known 

to be under the control of a transposase enzyme that is 
typically encoded by the transposable element. The genetic 
structures and transposition mechanisms of various transposable 
elements are summarized, for example, in "Transposable Genetic 
30 Elements" in "The Encyclopedia of Molecular Biology," Kendrew 

and Lawrence, Eds., Blackwell Science, Ltd., Oxford (1994), 
incorporated herein by reference. 

In vicro transposition systems that utilize the particular 
transposable elements of bacteriophage Mu and bacterial 
35 transposon TnlO have been described, by the research groups of 
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Kiyoshi Mizuuchi and Nancy Kleckner, respectively. 

The bacteriophage Mu system was first described by 
Mizuuchi, K. , "In Vitro Transposition of Bacteria Phage Mu: A 
Biochemical Approach to a Novel Replication Reaction," 
CaH-785-794 (1983) and Craigie, R. et al - , "A Defined System 
for the DNA Strand-Transfer Reaction at the Initiation of 
Bacteriophage Mu Transposition: Protein and DNA Substrate 
Requirements," P W ft.ff. P.g.A- 82:7570-7574 (1985). The DNA 
donor substrate (mini-Mu) for Mu in vitro reaction normally 
requires six Mu transposase binding sites (three of about 30 bp 
at each end) and an enhancer sequence located about 1 kb from 
the left end. The donor plasmid must be supercoiled. Proteins 
required are Mu-encoded A and B proteins and host-encoded HU 
and IHF proteins. Lavoie, B.D. and G . Chaconas, -Transposition 
of phage Mu DNA, " Qiff Topic* MWoM "1 Tmmunol . 204:83-99 
(1995) The Mu-based system is disfavored for in vitro 
transposition system applications because the Mu termini are 
complex and sophisticated and because transposition requires 
additional proteins above and beyond the transposase. 

The TnlO system was described by Morisato, D. and N. 
Kleckner, "TnlO Transposition and Circle Formation in vitro," 
££11 51-101-111 (1987) and by Benjamin, H. W. and N. Kleckner, 
"Excision Of TnlO from the Donor Site During Transposition 
Occurs By Flush Double -Strand Cleavages at the Transposon 
Termini," P W ft P U.S.A. 89:4648-4652 (1992). The TnlO system 
involves the a supercoiled circular DNA molecule carrying the 
transposable element (or a linear DNA molecule plus E. cola IHF 
protein) . The transposable element is defined by complex 42 bp 
terminal sequences with IHF binding site adjacent to the 
inverted repeat. In fact, even longer (81 bp) ends of TnlO 
were used in reported experiments. Sakai, J. et al . . 
"Identification and Characterization of Pre-Cleavage Synaptic 
Complex that is an Early Intermediate in TnlO transposition," 
E m p.p. J . 14:4374-4383 (1995). In the TnlO system, chemical 
treatment of the transposase protein is essential to support 
active transposition. In addition, the termini of the TnlO 
element limit its utility in a generalized in vitro 
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5 transposition system. 

Both the Mu- and TnlO-based in vitro transposition systems 
are further limited in that they are active only on covalently 
closed circular, supercoiled DNA targets. What is desired is a 
more broadly applicable in vitro transposition system that 
10 utilizes shorter, more well defined termini and which is active 

on target DNA of any structure (linear, relaxed circular, and 
supercoiled circular DNA) . 

BRIEF SUMMARY OF THE INVENTION 
The present invention is summarized in that an in vitro 

15 transposition system comprises a preparation of a suitably 

modified transposase of bacterial transposon Tn5, a donor DNA 
molecule that includes a transposable element, a target DNA 
molecule into which the transposable element can transpose, all 
provided in a suitable reaction buffer. 

20 The transposable element of the donor DNA molecule is 

characterized as a transposable DNA sequence of interest, the 
DNA sequence of interest being flanked at its 5 » - and 3 ' -ends 
by short repeat sequences that are acted upon in trans by Tn5 
transposase . 

25 The invention is further summarized in that the suitably 

modified transposase enzyme comprises two classes of 
differences from wild type Tn5 transposase, where each class 
has a separate measurable effect upon the overall transposition 
activity of the enzyme and where a greater effect is observed 

30 when both modifications are present. The suitably modified 

enzyme both (1) binds to the repeat sequences of the donor DNA 
with greater avidity than wild type Tn5 transposase ("class (1) 
mutation") and (2) is less likely than the wild type protein to 
assume an inactive multimeric form ("class (2) mutation"). A 

35 suitably modified Tn5 transposase of the present invention that 

contains both class (1) and class (2) modifications induces at 
least about 100-fold (±10%) more transposition than the wild 
type enzyme, when tested in combination in an in vivo 
conjugation assay as described by Weinreich, M.D., "Evidence 

4 0 that the cis Preference of the Tn5 Transposase is Caused by 
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Nonproductive Multimerization, - GfiBSg r^nd P-vPlopmftTlt 8:2363- 
2374 (1994), incorporated herein by reference. Under optimal 
conditions, transposition using the modified transposase may be 
higher. A modified transposase containing only a class (1) 
mutation binds to the repeat sequences with sufficiently 
greater avidity than the wild type Tn5 transposase that such a 
TnS transposase induces about 5- to 50-fold more transposition 
than the wild type enzyme, when measured in vivo. A modified 
transposase containing only a class (2) mutation is 
sufficiently less likely than the wild type TnS transposase to 
assume the multimeric form that such a TnS transposase also 
induces about 5- to 50-fold more transposition than the wild 
type enzyme, when measured in vivo. 

in another aspect, the invention is summarized in that a 
method for transposing the transposable element from the donor 
DNA into the target DNA in vitro includes the steps of mixing 
together the suitably modified TnS transposase protein, the 
donor DNA, and the target DNA in a suitable reaction buffer, 
allowing the enzyme to bind to the flanking repeat sequences of 
the donor DNA at a temperature greater than 0°C, but no higher 
25 than about 28°C, and then raising the temperature to 

physiological temperature (about 37°C) whereupon cleavage and 

strand transfer can occur. 

It is an object of the present invention to provide a 
useful in vitro transposition system having few structural 
30 requirements and high efficiency. 

It is another object of the present invention to provide a 
method that can be broadly applied in various ways, such as to 
create absolute defective mutants, to provide selective markers 
to target DNA, to provide portable regions of homology to a 
35 target DNA, to facilitate insertion of specialized DNA 

sequences into target DNA, to provide primer binding sites or 
tags for DNA sequencing, to facilitate production of genetic 
fusions for gene expression studies and protein domain mapping, 
as well as to bring together other desired combinations of DNA 
40 sequences (combinatorial genetics) . 

It is a feature of the present invention that the modified 
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transposase enzyme binds more tightly to DNA than does wild 
type Tn5 transposase . 

It is an advantage of the present invention that the 
modified transposase facilitates in vitro transposition 
reaction rates of at least about 100- fold higher than can be 
achieved using wild type transposase (as measured in vivo) . It 
is noted that the wild-type Tn5 transposase shows no detectable 
in vitro activity in the system of the present invention. 
Thus, while it is difficult to calculate an upper limit to the 
increase in activity, it is clear that hundreds, if not 
thousands, of colonies are Observed when the products of in 
vitro transposition are assayed in vivo. 

It is another advantage of the present invention that in 
vitro transposition using this system can utilize donor DNA and 
target DNA that is circular or linear. 

It is yet another advantage of the present invention that 
in vitro transposition using this system requires no outside 
high energy source and no other protein other than the modified 
transposase . 

Other objects, features, and advantages of the present 
invention will become apparent upon consideration of the 
following detailed description. 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS 
Fig. 1 depicts test plasmid pRZTLl , used herein to 
demonstrate transposition in vitro of a transposable element 
located between a pair of TnS outside end termini. Plasmid 
pRZTLl is also shown and described in SEQ ID NO: 3. 

Fig. 2 depicts an electrophoretic analysis of plasmid 
pRZTLl before and after in vitro transposition. Data obtained 
using both circular and linear plasmid substrates are shown. 

Fig. 3 is an electrophoretic analysis of plasmid pRZTLl 
after in vitro transposition, including further analysis of the 
molecular species obtained using circular and linear plasmid 
substrates . 

Fig. 4 shows plasmids pRZ1496, pRZ5451 and pRZTLl, which 
are detailed in the specification. 

_ 5 _ 
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5 Fig. 5 shows a plot of papillae per colony over time for 

various mutant OE sequences tested in vivo against EK54/MA56 
transposase . 

Fig. 6 shows a plot of papillae per colony over time for 
various mutant OE sequences with a smaller Y-axis than is shown 
10 in Fig. 5 tested against EK54/MA56 transposase. 

Fig. 7 shows a plot of papillae per colony over time for 
various mutant OE sequences tested against MA56 TnS 
transposase. 

Fig. 8 shows in vivo transposition using two preferred 
15 mutants, tested against MA56 and EK54/MA56 transposase. 

DETAILED DESCRIPTION OF THE INVENTION 
It will be appreciated that this technique provides a 
simple, in vitro system for introducing any transposable 
element from a donor DNA into a target DNA. It is generally 
20 accepted and understood that TnS transposition requires only a 

pair of OE termini, located to either side of the transposable 
element. These OE termini are generally thought to be 18 or 19 
bases in length and are inverted repeats relative to one 
another. Johnson, R. C and W. S. Reznikoff, mZUZZ 304:280 
(1983) incorporated herein by reference. The TnS inverted 
repeat sequences, which are referred to as "termini" even 
though they need not be at the termini of the donor DNA 
molecule, are well known and understood. 

Apart from the need to flank the desired transposable 
10 element with standard TnS outside end ("OE") termini, few other 

requirements on either the donor DNA or the target DNA are 
envisioned. It is thought that TnS has few, if any. 
preferences for insertion sites, so it is possible to use the 
system to introduce desired sequences at random into target 
35 DNA. Therefore, it is believed that this method, employing the 
modified transposase described herein and a simple donor DNA, 
is broadly applicable to introduce changes into any target DNA, 
without regard to its nucleotide sequence. It will, thus, be 
applied to many problems of interest to those skilled in the 
40 art of molecular biology. 



25 
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In the method, the modified transposase protein is 
combined in a suitable reaction buffer with the donor DNA and 
the target DNA. A suitable reaction buffer permits the 
transposition reaction to occur. A preferred, but not 
necessarily optimized, buffer contains spermidine to condense 
the DNA, glutamate, and magnesium, as well as a detergent, 
which is preferably 3- [ (3-cholamidopropyl) dimethyl - ammonio] -1- 
propane sulfonate ("CHAPS"). The mixture can be incubated at a 
temperature greater than 0°C and as high as about 2 8°C to 
facilitate binding of the enzyme to the OE termini. Under the 
15 buffer conditions used by the inventors in the Examples, a 

pretreatment temperature of 30°C was not adequate. A preferred 
temperature range is between 16°C and 28°C. A most preferred 
pretreatment temperature is about 20°C. Under different buffer 
conditions, however, it may be possible to use other below- 
20 physiological temperatures for the binding step. After a short 
pretreatment period of time (which has not been optimized, but 
which may be as little as 30 minutes or as much as 2 hours, and 
is typically 1 hour) , the reaction mixture is diluted with 2 
volumes of a suitable reaction buffer and shifted to 
25 physiological conditions for several more hours (say 2-3 hours) 
to permit cleavage and strand transfer to occur. A temperature 
of 37°C. or thereabouts, is adequate. After about 3 hours, the 
rate of transposition decreases markedly. The reaction can be 
stopped by phenol -chloroform extraction and can then be 
30 desalted by ethanol precipitation. 

When the DNA has been purified using conventional 
purification tools, it is possible to employ simpler reaction 
conditions in the in vitro transposition method. DNA of 
sufficiently high purity can be prepared by passing the DNA 
35 preparation through a resin of the type now commonly used in 
the molecular biology laboratory, such as the Qiagen resin of 
the Qiagen plasmid purification kit (Catalog No. 12162) . When 
such higher quality DNA is employed, CHAPS can be omitted from 
the reaction buffer. When CHAPS is eliminated from the 
reaction buffer, the reactants need not be diluted in the 
manner described above. Also, the low temperature incubation 



40 
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5 step noted above can be eliminated in favor of a single 

incubation for cleavage and strand transfer at physiological 
conditions. A three hour incubation at 37°C is sufficient. 

Following the reaction and subsequent extraction steps, 
transposition can be assayed by introducing the nucleic acid 
10 reaction products into suitable bacterial host cells (e.g., E. 

coli K-12 DH5a cells (recA") ; commercially available from Life 
Technologies (Gibco-BRL) ) preferably by electroporation, 
described by Dower et al.. ffiic AcidP , Res , 16:6127 (1988), and 
monitoring for evidence of transposition, as is described 

15 elsewhere herein. 

Those persons skilled in the art will appreciate that 
apart from the changes noted herein, the transposition reaction 
can proceed under much the same conditions as would be found in 
an in vivo reaction. Yet, the modified transposase described 
20 herein so increases the level of transposition activity that it 

is now possible to carry out this reaction in vitro where this 
has not previously been possible. The rates of reaction are 
even greater when the modified transposase is coupled with an 
optimized buffer and temperature conditions noted herein. 
25 ~ In another aspect, the present invention is a preparation 

of a modified TnS transposase enzyme that differs from wild 
type Tn5 transposase in that it (1) binds to the repeat 
sequences of the donor DNA with greater avidity than wild type 
TnS transposase and (2) is less likely than the wild type 
30 protein to assume an inactive multimeric form. An enzyme 

having these requirements can be obtained from a bacterial host 
cell containing an expressible gene for the modified enzyme 
that is under the control of a promoter active in the host 
cell. Genetic material that encodes the modified Tn5 
35 transposase can be introduced (e.g., by electroporation) into 
suitable bacterial host cells capable of supporting expression 
of the genetic material. Known methods for overproducing and 
preparing other TnS transposase mutants are suitably employed. 
For example, Weinreich, M. D., et al., supra, describes a 
40 suitable method for overproducing a TnS transposase. A second 

method for purifying TnS transposase was described in de la 

-8- 
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Cruz, N. B., et al., "Characterization of the Tn5 Transposase 
and Inhibitor Proteins: A Model for the Inhibition of 
Transposition, » J. PflCt, 175:6932-6938 (1993), also 
incorporated herein by reference. It is noted that induction 
can be carried out at temperatures below 37°C, which is the 
temperature used by de la Cruz, et al. Temperatures at least in 
the range of 33 to 37°C are suitable. The inventors have 
determined that the method for preparing the modified 
transposase of the present invention is not critical to success 
of the method, as various preparation strategies have been used 

15 with equal success. 

Alternatively, the protein can be chemically synthesized, 
in a manner known to the art, using the amino acid sequence 
attached hereto as SEQ ID NO: 2 as a guide. It is also possible 
to prepare a genetic construct that encodes the modified 
protein (and associated transcription and translation signals) 
by using standard recombinant DNA methods familiar to molecular 
biologists. The genetic material useful for preparing such 
constructs can be obtained from existing Tn5 constructs, or can 
be prepared using known methods for introducing mutations into 
25 genetic material (e.g., random mutagenesis PCR or site-directed 

mutagenesis) or some combination of both methods. The genetic 
sequence that encodes the protein shown in SEQ ID NO: 2 is set 

forth in SEQ ID NO:l. 

The nucleic acid and amino acid sequence of wild type Tn5 
30 transposase are known and published. N. C.B.I. Accession Number 

U00004 L19385, incorporated herein by reference. 

In a preferred embodiment, the improved avidity of the 
modified transposase for the repeat sequences for OE termini 
(class (1) mutation) can be achieved by providing a lysine 
3 5 residue at amino acid 54, which is glutamic acid in wild type 

Tn5 transposase. The mutation strongly alters the preference 
of the transposase for OE termini, as opposed to inside end 
("IE") termini. The higher binding of this mutation, known as 
EK54, to OE termini results in a transposition rate that is 
about 10-fold higher than is seen with wild type transposase. 
A similar change at position 54 to valine (mutant EV54) also 
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results in somewhat increased binding/transposition for OE 
termini, as does a threonine-to-proline change at position 47 
(mutant TP47; about 10-fold higher) . It is believed that 
other, comparable transposase mutations (in one or more ammo 
acids) that increase binding avidity for OE termini may also be 
obtained which would function as well or better in the in vitro 

assay described herein. 

One of ordinary skill will also appreciate that changes to 
the nucleotide sequences of the short repeat sequences of the 
donor DNA may coordinate with other mutation (s) in or near the 
binding region of the transposase enzyme to achieve the same 
increased binding effect, and the resulting 5- to 50-fold 
increase in transposition rate. Thus, while the applicants 
have exemplified one case of a mutation that improves binding 
of the exemplified transposase, it will be understood that 
other mutations in the transposase, or in the short repeat 
sequences, or in both, will also yield transposases that fall 
within the scope and spirit of the present invention. A 
suitable method for determining the relative avidity for Tn5 OE 
termini has been published by Jilk, R. A., et al . , "The 
25 Organization of the Outside end of Transposon TnS," ,T . Bact . 

178 : 1671-79 (1996) . 

The transposase of the present invention is also less 
likely than the wild type protein to assume an inactive 
multimeric form. In the preferred embodiment, that class (2) 
mutation from wild type can be achieved by modifying amino acid 
372 (leucine) of wild type Tn5 transposase to a proline (and. 
likewise by modifying the corresponding DNA to encode proline) . 
This mutation, referred to as LP372, has previously been 
characterized as a mutation in the dimerization region of the 
transposase. Weinreich, et al . , supra. It was noted by 
Weinreich et al . that this mutation at position 372 maps to a 
region shown previously to be critical for interaction with an 
inhibitor of Tn5 transposition. The inhibitor is a protein 
encoded by the same gene that encodes the transposase, but 
which is truncated at the N-terminal end of the protein, 
relative to the transposase. The approach of Weinreich et al . 
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for determining the extent to which multimers are formed is 
suitable for determining whether a mutation falls within the 
scope of this element. 

It is thought that when wild type Tn5 transposase 
multimerizes. its activity in trans is reduced. Presumably, a 
mutation in the dimerization region reduces or prevents 
multimerization, thereby reducing inhibitory activity and 
leading to levels of transposition 5- to 50 -fold higher than 
are seen with the wild type transposase. The LP372 mutation 
achieves about 10-fold higher transposition levels than wild 
type. Likewise, other mutations (including mutations at a one 
or more amino acid) that reduce the ability of the transposase 
to multimerize would also function in the same manner as the 
single mutation at position 372, and would also be suitable in 
a transposase of the present invention. It may also be 
possible to reduce the ability of a Tn5 transposase to 
multimerize without altering the wild type sequence in the so- 
called dimerization region, for example by adding into the 
system another protein or non-protein agent that blocks the 
dimerization site. Alternatively, the dimerization region 
could be removed entirely from the transposase protein. 

As was noted above, the inhibitor protein, encoded in 
partially overlapping sequence with the transposase, can 
interfere with transposase activity. As such, it is desired 
that the amount of inhibitor protein be reduced over the amount 
observed in wild type in vivo. For the present assay, the 
transposase is used in purified form, and it may be possible to 
separate the transposase from the inhibitor (for example, 
according to differences in size) before use. However, it is 
also possible to genetically eliminate the possibility of 
having any contaminating inhibitor protein present by removing 
its start codon from the gene that encodes the transposase. 

An AUG in the wild type Tn5 transposase gene that encodes 
methionine at transposase amino acid 56 is the first codon of 
the inhibitor protein. However, it has already been shown that 
replacement of the methionine at position 56 has no apparent 
effect upon the transposase activity, but at the same time 
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prevents translation of the inhibitor protein, thus resulting 
in a somewhat higher transposition rate. Weigand, T. W. and W. 
S. Reznikoff , "Characterization of Two Hypertransposing Tn5 
Mutants," .t Bact . 174:1229-1239 (1992), incorporated herein by 
reference. In particular, the present inventors have replaced 
the methionine with an alanine in the preferred embodiment (and 
have replaced the methionine -encoding AUG codon with an 
alanine -encoding GCC) . A preferred transposase of the present 
invention therefore includes an amino acid other than 
methionine at amino acid position 56, although this change can 
be considered merely technically advantageous (since it ensures 
the absence of the inhibitor from the in vitro system) and not 
essential to the invention (since other means can be used to 
eliminate the inhibitor protein from the in vitro system). 

The most preferred transposase amino acid sequence known 
to the inventors differs from the wild type at amino acid 
positions 54, 56, and 372. The mutations at positions 54 and 
372 separately contribute approximately a 10 -fold increase to 
the rate of transposition reaction in vivo. When the mutations 
are combined using standard recombinant techniques into a 
single molecule containing both classes of mutations, reaction 
rates of at least about 100 -fold higher than can be achieved 
using wild type transposase are observed when the products of 
the in vitro system are tested in vivo. The mutation at 
position 56 does not directly affect the transposase activity. 

Other mutants from wild type that are contemplated to be 
likely to contribute to high transposase activity in vitro 
include, but are not limited to glutaminic acid- to- lysine at 
position 110, and glutamic acid to lysine at position 345. 

It is, of course, understood that other changes apart from 
these noted positions can be made to the modified transposase 
(or to a construct encoding the modified transposase) without 
adversely affecting the transposase activity. For example, it 
is well understood that a construct encoding such a transposase 
could include changes in the third position of codons such that 
the encoded amino acid does not differ from that described 
herein. In addition, certain codon changes have little or no 
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5 functional effect upon the transposition activity ot tfte 

encoded protein. Finally, other changes may be introduced 
which provide yet higher transposition activity in the encoded 
protein. It is also specifically envisioned that combinations 
of mutations can be combined to encode a modified transposase 

10 having even higher transposition activity than has been 

exemplified herein. All of these changes are within the scope 
of the present invention. It is noted, however, that a 
modified transposase containing the EK110 and EK34 5 mutations 
(both described by Weigand and Reznikoff, supra, had lower 

15 transposase activity than a transposase containing either 

mutation alone. 

After the enzyme is prepared and purified, as described 
supra, it can be used in the in vitro transposition reaction 
described above to introduce any desired transposable element 

20 from a donor DNA into a target DNA . The donor DNA can be 

circular or can be linear. If the donor DNA is linear, it is 
preferred that the repeat sequences flanking the transposable 
element should not be at the termini of the linear fragment but 
should rather include some DNA upstream and downstream from the 

25 region flanked by the repeat sequences. 

As was noted above, Tn5 transposition requires a pair of 
eighteen or nineteen base long termini. The wild type Tn5 
outside end (OE) sequence ( 5 1 - CTGACTCTTATACACAAGT - 3 1 ) (SEQ ID 
NO: 7) has been described. It has been discovered that a 

30 transposase-catalyzed in vitro transposition frequency at least 

as high as that of wild type OE is achieved if the termini in a 
construct include bases ATA at positions 10, 11, and 12, 
respectively, as well as the nucleotides in common between wild 
type OE and IE (e.g., at positions 1-3, 5-9, 13, 14, 16, and 

35 optionally 19) . The nucleotides at positions 4, 15, 17, and 18 

can correspond to the nucleotides found at those positions in 
either wild type OE or wild type IE. It is noted that the 
transposition frequency can be enhanced over that of wild type 
OE if the nucleotide at position 4 is a T. The importance of 

40 these particular bases to transposition frequency has not 

previously been identified. 
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5 it is noted that these changes are not intended to 

encompass every desirable modification to OE. As is described 
elsewhere herein, these attributes of acceptable termini 
modifications were identified by screening mutants having 
randomized differences between IE and OE termini. While the 
10 presence in the termini of certain nucleotides is shown herein 
to be advantageous, other desirable terminal sequences may yet 
be obtained by screening a larger array of degenerate mutants 
that include changes at positions other than those tested 
herein as well as mutants containing nucleotides not tested m 
15 the described screening. In addition, it is clear to one 

skilled in the art that if a different transposase is used, xt 
may still be possible to select other variant termini, more 
compatible with that particular transposase. 

Among the mutants shown to be desirable and within the 
20 scope of the invention are two hyperactive mutant OE sequences 

that were identified in vivo. Although presented here as 
single stranded sequences, in fact, the wild type and mutant OE 
sequences include complementary second strands. The first 
hyperactive mutant, 5 ' - CTGTCTCTTATACACATCT - 3 ' (SEQ ID NO: 8), 
25 differs from the wild type OE sequence at positions 4, 17, and 

18 counting from the 5' end, but retains ATA at positions 10- 
12 The second. 5 ' -CTGTCTCTTATACACATCT -3 ' (SEQ ID NO: 9), 
differs from the wild type OE sequence at positions 4, 15, 17, 
and 18, but also retains ATA at positions 10-12. These two 
30 hyperactive mutant OE sequences differ from one another only at 

position 15, where either G or C is present. OE-like activity 
(or higher activity) is observed in a mutant sequence when xt 
contains ATA at positions 10, 11 and 12. It may be possible to 
reduce the length of the OE sequence from 19 to 18 nucleotide 
35 pairs with little or no effect. 

When one of the identified hyperactive mutant OE sequences 
flanks a substrate DNA, the in vivo transposition frequency of 
EK54/MA56 transposase is increased approximately 40-60 fold 
over the frequency that is observed when wild type OE termxnx 
40 flank the transposable DNA. The EK54/MA56 transposase is 

already known to have an in vivo transposition frequency 

-14- 



SUBSTTTUTE SHEET (RULE 26) 



BNSDOCID: <WO 9810077A1J_> 



WO 98/10077 PCT/US97/15941 

5 approximately an 8-10 fold higher than wild type transposase, 

using wild type OE termini. Tn5 transposase having the 
EK54/MA56 mutation is known to bind with greater avidity to OE 
and with lesser avidity to the Tn5 inside ends (IE) than wild 
type transposase . 

10 A suitable mutant terminus in a construct for use in the 

assays of the present invention is characterized biologically 
as yielding more papillae per colony in a comparable time, say 
68 hours, than is observed in colonies harboring wild type OE 
in a comparable plasmid. Wild type OE can yield about 100 
15 papillae per colony when measured 68 hours after plating in a 

papulation assay using EK54/MA56 transposase, as is described 
elsewhere herein. A preferred mutant would yield between about 
200 and 3000 papillae per colony, and a more preferred mutant 
between about 1000 and 3000 papillae per colony, when measured 
20 in the same assay and time frame. A most preferred mutant 

would yield between about 2000 and 3 000 papillae per colony 
when assayed under the same conditions. Papulation levels may 
be even greater than 3000 per colony, although it is difficult 
to quantitate at such levels. 
25 Transposition frequency is also substantially enhanced in 

the in vitro transposition assay of the present invention when 
substrate DNA is flanked by a preferred mutant OE sequence and 
a most preferred mutant transposase (comprising EK54/MA56/LP372 
mutations) is used. Under those conditions, essentially all of 
30 the substrate DNA is converted into transposition products. 

The rate of in vitro transposition observed using the 
hyperactive termini is sufficiently high that, in the 
experience of the inventors, there is no need to select for 
transposition events. All colonies selected at random after 
35 transformation for further study have shown evidence of 

transposition events. 

This advance can represent a significant savings in time 
and laboratory effort. For example, it is particularly 
advantageous to be able to improve in vitro transposition 
40 frequency by modifying DNA rather than by modifying the 

transposase because as transposase activity increases in host 
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5 cells, there is an increased likelihood that cells containing 
the transposase are killed during growth as a result of 
aberrant DNA transpositions. In contrast, DNA of interest 
containing the modified OE termini can be grown in sources 
completely separate from the transposase, thus not putting the 

10 host cells at risk. 

Without intending to limit the scope of this aspect of 
this invention, it is apparent that the tested hyperactive 
termini do not bind with greater avidity to the transposase 
than do wild type OE termini. Thus, the higher transposition 
15 frequency brought about by the hyperactive termini is not due 

to enhanced binding to transposase. 

The transposable element between the termini can include 
any desired nucleotide sequence. The length of the 
transposable element between the termini should be at least 
20 about 50 base pairs, although smaller inserts may work. No 

upper limit to the insert size is known. However, it is known 
that a donor DNA portion of about 300 nucleotides in length can 
function well. By way of non-limiting examples, the 
transposable element can include a coding region that encodes a 
25 detectable or selectable protein, with or without associated 

regulatory elements such as promoter, terminator, or the like. 

If the element includes such a detectable or selectable 
coding region without a promoter, it will be possible to 
identify and map promoters in the target DNA that are uncovered 
30 by transposition of the coding region into a position 

downstream thereof, followed by analysis of the nucleic acid 
sequences upstream from the transposition site. 

Likewise, the element can include a primer binding site 
that can be transposed into the target DNA, to facilitate 
35 sequencing methods or other methods that rely upon the use of 

primers distributed throughout the target genetic material. 
Similarly, the method can be used to introduce a desired 
restriction enzyme site or polylinker, or a site suitable for 
another type of recombination, such as a cre-lox, into the 
40 target. 

The invention can be better understood upon consideration 
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of the following examples which are intended to be exemplary 
and not limiting on the invention. 

EXAMPLES 

To obtain the transposase modified at position 54, the 
first third of the coding region from an existing DNA clone 
that encodes the Tn5 transposase but not the inhibitor protein 
(MA56) was mutagenized according to known methods and DNA 
fragments containing the mutagenized portion were cloned to 
produce a library of plasmid clones containing a full length 
transposase gene. The clones making up the library were 
transformed into E. coli K-12 strain MDW320 bacteria which were 
plated and grown into colonies. Transposable elements provided 
in the bacteria on a separate plasmid contained a defective 
lacZ gene. The separate plasmid, P OXgen386, was described by 
Weinreich, M. et al., "A functional analysis of the Tn5 
Transposase: Identification of Domains Required for DNA Binding 
and Dimerization, - 3 ™^ Riol. 241:166-177 (1993), 
incorporated herein by reference. Colonies having elevated 
transposase activity were selected by screening for blue (LacZ) 
spots in white colonies grown in the presence of X-gal. This 
papulation assay was described by Weinreich, et al . (1993), 
supra. The 5 ' -most third of Tn5 transposase genes from such 
colonies were sequenced to determine whether a mutation was 
responsible for the increase in transposase activity. It was 
determined that a mutation at position 54 to lysine (K) 
correlated well with the increase in transposase activity. 
Plasmid P RZ5412-EK54 contains lysine at position 54 as well as 
the described alanine at position 56. 

The fragment containing the LP3 72 mutation was isolated 
from pRZ4870 (Weinreich et al (1994)) using restriction enzymes 
Nhel and Bglll, and were ligated into Nhel-Bglll cut pRZ5412- 
EK54 to form a recombinant gene having the mutations at 
positions 54. 56 and 372, as described herein and shown in SEQ 
ID NO:l. The gene was tested and shown to have at least about 
a one hundred fold increase in activity relative to wild type 
Tn5 transposase. Each of the mutants at positions 54 and 3 72 
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5 alone had about a 10 -fold increase in transposase activity. 

The modified transposase protein encoded by the triple- 
mutant recombinant gene was transferred into commercial T7 
expression vector pET-21D (commercially available from Novagen, 
Madison, WI) by inserting a BspHI/Sall fragment into Nhol/Xhol 
10 fragment of the pET-21D vector. This cloning puts the modified 
transposase gene under the control of the T7 promoter, rather 
than the natural promoter of the transposase gene. The gene 
product was overproduced in BL21 (DE3) pLysS bacterial host 
cells, which do not contain the binding site for the enzyme, by 
15 specific induction in a fermentation process after cell growth 
is complete. (Sfi£. Studier, F. W., et al . , "Use of T7 RNA 
Polymerase to Direct Expression of Cloned Genes," Methods 
Rnrymol . 185:60-89 (1990)). The transposase was partially 
purified using the method of de la Cruz, modified by inducing 
20 overproduction at 33 or 37°C. After purification, the enzyme 

preparation was stored at -70°C in a storage buffer (10% 
glycerol, 0 . 7M NaCl , 20 mM Tris-HCl, pH 7.5, 0.1% Triton-XlOO 
and 10 mM CHAPS) until use. This storage buffer is to be 
considered exemplary and not optimized. 
25 A single plasmid (pRZTLl, Fig. 1) was constructed to serve 

as both donor and target DNA in this Example. The complete 
sequence of the pRZTLl plasmid DNA is shown and described in 
SEQ ID NO:3. Plasmid pRZTLl contains two Tn5 19 base pair OE 
termini in inverted orientation to each other. Immediately 
3 0 adjacent to one OE sequence is a gene that would encode 

tetracycline resistance, but for the lack of an upstream 
promoter. However, the gene is expressed if the tetracycline 
resistance gene is placed downstream of a transcribed region 
(e.g., under the control of the promoter that promotes 
3 5 transcription of the chloramphenicol resistance gene also 
present on pRZTLl). Thus, the test plasmid pRZTLl can be 
assayed in vivo after the in vitro reaction to confirm that 
transposition has occurred. The plasmid pRZTLl also includes 
an origin of replication in the transposable element, which 
40 ensures that all transposition products are plasmids that can 
replicate after introduction in host cells. 
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The following components were used in typical 20/zl in 
vitro transposition reactions: 

Modified transposase: 2 til (approximately 0.1 
enzyme//zl) in storage buffer (10% glycerol, 0 . 7M NaCl, 20 
mM Tris-HCl, P H 7.5, 0.1% Triton-XlOO and 10 mM CHAPS) 

Donor/Target DNA: 18 fil (approximately 1-2 fxg) in 
reaction buffer (at final reaction concentrations of 0.1 M 
potassium glutamate, 25 mM Tris acetate, pH 7.5, 10 mM 
Mg 2 *-acetate, 50 ng/ml BSA, 0 . 5 mM (J-mercaptoethanol , 2 mM 
spermidine, 100 ng/ml tRNA) . 

At 20°C, the transposase was combined with pRZTLl DNA for 
about 60 minutes. Then, the reaction volume was increased by 
adding two volumes of reaction buffer and the temperature was 
raised to 37°C for 2-3 hours whereupon cleavage and strand 

transfer occurred. 

Efficient in vitro transposition was shown to have 
occurred by in vivo and by in vitro methods. In vivo, many 
tetracycline-resistant colonies were observed after 
transferring the nucleic acid product of the reaction into DH5a 
bacterial cells. As noted, tetracycline resistance can only 
arise in this system if the transposable element is transposed 
downstream from an active promoter elsewhere on the plasmid. 
typical transposition frequency was 0.1% of cells that received 
plasmid DNA, as determined by counting chloramphenicol 
resistant colonies. However, this number underestimates the 
total transposition event frequency because the detection 
system limits the target to 1/16 of the total. 

Moreover, in vitro electrophoretic (1% agarose) and DNA 
sequencing analyses of DNA isolated from purified colonies 
revealed products of true transposition events, includxng both 
intramolecular and intermodular events. Results of typical 
reactions using circular plasmid pRZTLl substrates are shown in 
Lanes 4 * 5 . Lane 6 of Fig. 2 shows the results obtained using 
linear plasmid pRZTLl substrates. 
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The bands were revealed on 1% agarose gels by staining 
with SYBR Green (FMC BioProducts) and were scanned on a 
Fluorimager SI (Molecular Dynamics) . In Figure 2, lane 1 shows 
relaxed circle, linear, and closed circle versions of pRZTLl . 
Lanes 2 and 3 show intramolecular and intermolecular 
transposition products after in vitro transposition of pRZTLl, 
respectively. The products were purified from electroporated 
DH5a cells and were proven by size and sequence analysis to be 
genuine transposition products. Lanes 4 and 5 represent 
products of two independent in vitro reactions using a mixture 
of closed and relaxed circular test plasmid substrates. In 
lane 6, linear pRZTLl (Xhol-cut) was the reaction substrate. 
Lane 7 includes a BstEII digest of lambda DNA as a molecular 

weight standard. 

Fig. 3 reproduces lanes 4, 5, and 6 of Fig. 2 and shows an 
analysis of various products, based upon secondary restriction 
digest experiments and re-electro P oration and DNA sequencing. 
The released donor DNA corresponds to the fragment of pRZTLl 
that contains the kanamycin resistance gene between the two OE 
sequences, or, in the case of the linear substrate, the OE-XhoI 
fragment. Intermolecular transposition products can be seen 
only as relaxed DNA circles. Intramolecular transposition 
products are seen as a ladder, which results from conversion of 
the initial superhelicity of the substrate into DNA knots. The 
reaction is efficient enough to achieve double transposition 
events that are a combination of inter- and intramolecular 
events. 

A preliminary investigation was made into the nature of 
the termini involved in a transposition reaction. Wild type 
Tn5 OE and IE sequences were compared and an effort was 
undertaken to randomize the nucleotides at each of the seven 
positions of difference. A population of oligonucleotides 
degenerate at each position of difference was created. Thus, 
individual oligonucleotides in the population randomly included 
either the nucleotide of the wild type OE or the wild type IE 
sequence. In this scheme, 2 7 (128) distinct oligonucleotides 
were synthesized using conventional tools. These 
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5 " oligonucleotides having sequence characteristics of both OE and 
IE are referred to herein as OE/IE-like sequences. To avoid 
nomenclature issues that arise because the oligonucleotides are 
intermediate between OE and IE wild type sequences, the 
applicants herein note that selected oligonucleotide sequences 
10 are compared to the wild type OE rather than to wild type IE, 
unless specifically noted. It will be appreciated by one 
skilled in the art that if IE is selected as the reference 
point, the differences are identical but are identified 
differently. 

15 The following depicts the positions (x) that were varied 

in this mutant production scheme. WT OE is shown also at SEQ 
ID NO: 7, WT IE at SEQ ID NO: 10. 

5 ' -CTGACTCTTATACACAAGT-3 ' (WT OE) 

x xxx x xx (positions of difference) 

20 5 * -CTGTCTCTTGATCAGATCT-3 ' (WT IE) 

in addition to the degenerate OE/IE-like sequences, the 
37- base long synthetic oligonucleotides also included terminal 
SphI and Kpnl restriction enzyme recognition and cleavage sites 
for convenient cloning of the degenerate oligonucleotides into 
25 plasmid vectors. Thus, a library of randomized termini was 
created from population of 2 7 (128) types of degenerate 
oligonucleotides . 

Fig. 4 shows pRZ14 96, the complete sequence of which is 
presented as SEQ ID NO: 11. The following features are noted in 

30 the sequence: 

mature Pos i tion 

WT OE 94-112 

LacZ coding 135-3137 

LacY coding 3199-4486 

35 LacA coding 4553-6295 

tet r coding 6669-9442 

transposase coding 10683-12111 (Comp. Strand) 

Cassette IE 12184-12225 

colEl sequence 127732-19182 

40 The IE cassette shown in Fig. 4 was excised using SphI and 

Kpnl and was replaced, using standard cleavage and ligation 
methods, by the synthetic termini cassettes comprising OE/IE- 
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5 like portions. Between the fixed wild type OE sequence and the 

OE/IE-like cloned sequence, plasmid pRZ1496 comprises a gene 
whose activity can be detected, namely LacZYA, as well as a 
selectable marker gene, tet r . The LacZ gene is defective in 
that it lacks suitable transcription and translation initiation 
10 signals. The LacZ gene is transcribed and translated only when 

it is transposed into a position downstream from such signals. 

The resulting clones were transformed using 
electroporation into dam", LacZ- bacterial cells, in this case 
JCMlOl/pOXgen cells which were grown at 37°C in LB medium under 
15 standard conditions. A dam" strain is preferred because dam 

methylation can inhibit IE utilization and wild type IE 
sequences include two dam methylation sites. A dam" strain 
eliminates dam methylation as a consideration in assessing 
transposition activity. The Tet r cells selected were LacZ"; 
20 transposition-activated Lac expression was readily detectable 

against a negative background. pOXgen is a non-essential F 
factor derivative that need not be provided in the host cells. 

In some experiments, the EK54/MA56 transposase was encoded 
directly by the transformed pRZ1496 plasmid. In other 
25 experiments, the pRZ14 96 plasmid was modified by deleting a 

unique Hindlll/EagI fragment (nucleotides 9112-12083) from the 
plasmid (see Fig. 4) to prevent transposase production. In the 
latter experiments, the host cells were co- transformed with the 
Hindlll/Eagl-deleted plasmid, termed pRZ5451 (Fig. 4), and with 
30 an EK54/MA56 transposase -encoding chloramphenicol-resistant 

plasmid. In some experiments, a comparable plasmid encoding a 
wild type Tn5 transposase was used for comparison. 

Transposition frequency was assessed by a papulation 
assay that measured the number of blue spots (Lac producing 
35 cells or "papillae") in an otherwise white colony. Transformed 

cells were plated (approx. 50 colonies per plate) on Glucose 
minimal Miller medium (Miller, J., Experiments in Mfllegular 
Genetics . Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 
(1972)) containing 0.3% casamino acids, 5 -bromo-4 -chloro- 3 - 
40 indolyl-3-D-galactoside (40 ^cg/ml) and phenyl - p-D-galactoside 

(0.05%). The medium contained tetracycline (15 tig/ml) and, 
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5 where needed, chloramphenicol (20 ^g/ml) . Colonies that 

survived the selection were evaluated for transposition 
frequency in vivo. Although colonies exhibiting superior 
papulation were readily apparent to the naked eye, the number 
of blue spots per colony were determined over a period of 
10 several days (approximately 90 hours post -plat ing) . 

To show that the high-papillation phenotype was conferred 
by the end mutations in the plasmids, colonies were re- streaked 
if they appeared to have papulation levels higher than was 
observed when wild type IE was included on the plasmid. 
15 Colonies picked from the streaked culture plates were 

themselves picked and cultured. DNA was obtained and purified 
from the cultured cells, using standard protocols, and was 
transformed again into "clean" JCMlOl/pOXgen cells. 
Papiilation levels were again compared with wild type IE- 
20 containing plasmids in the above-noted assays, and consistent 

results were observed. 

To obtain DNA for sequencing of the inserted 
oligonucleotide, cultures were grown from white portions of 117 
hype rpap ilia ting colonies, and DNA was prepared from each 
25 colony using standard DNA miniprep methods. The DNA sequence 

of the OE/IE-like portion of 117 clones was determined (42 from 
transformations using P RZ14 96 as the cloning vehicle; 75 from 
transformations using pRZ5451 as the cloning vehicle) . Only 29 
unique mutants were observed. Many mutants were isolated 
30 multiple times. All mutants that showed the highest 

papulation frequencies contain OE-derived bases at positions 
10, 11, and 12. When the OE-like bases at these positions were 
maintained, it was impossible to measure the effect on 
transposition of other changes, since the papulation level was 
3 5 already extremely high. 

One thousand five hundred seventy five colonies were 
screened as described above. The likelihood that all 128 
possible mutant sequences were screened was greater than 95%. 
Thus, it is unlikely that other termini that contribute to a 
40 greater transformation frequency will be obtained using the 

tested transposase. 
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Tables I and II report the qualitative papillation level 
of mutant constructs carrying the indicated hybrid end 
sequences or the wild type OE or IE end sequences. In the 
tables, the sequence at each position of the terminus 
corresponds to wild type IE unless otherwise noted. The 
applicants intend that, while the sequences are presented in 
shorthand notation, one of ordinary skill can readily determine 
the complete 19 base pair sequence of every presented mutant, 
and this specification is to be read to include all such 
complete sequences. Table I includes data from trials where 
the EK54 transposase was provided in trans; Table II, from 
those trials where the EK54 transposase was provided in cis. 
Although a transposase provided in cis is more active in 
absolute terms than a transposase provided in trans, the cis or 
trans source of the transposase does not alter the relative in 
vivo transposition frequencies of the tested termini. 

Tables I and II show that every mutant that retains ATA at 
positions 10, 11, and 12, respectively, had an activity 
comparable to, or higher than, wild type OE, regardless of 
whether the wild type OE activity was medium (Table I, trans) 
or high (Table II, cis) . Moreover, whenever that three -base 
sequence in a mutant was not ATA, the mutant exhibited lower 
papillation activity than wild type OE. It was also noted that 
papillation is at least comparable to, and tends to be 
significantly higher than, wild type OE when position 4 is a T. 

Quantitative analysis of papillation levels was difficult, 
beyond the comparative levels shown (very low, low, medium, 
medium high, and high) . However, one skilled in the art can 
readily note the papillation level of OE and can recognize 
those colonies having comparable or higher levels. It is 
helpful to observe the papillae with magnification. 

The number of observed papillae increased over time, as is 
shown in Figs 5-7 which roughly quantitate the papillation 
observed in cells transformed separately with 9 clones 
containing either distinct synthetic termini cassettes or wild 
type OE or IE termini. In these 3 figures, each mutant is 
identified by its differences from the wild type IE sequence. 
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Note that, among the tested mutants, only mutant 10A/11T/12A 
had a higher transposition papulation level than wild type OE. 
That mutant, which would be called mutant 4/15/17/18 when OE is 
the reference sequence) was the only mutant of those shown in 
Figs. 5-7 that retained the nucleotides ATA at positions 10, 
11, and 12. Figs. 5 (y-axis: 0 - 1500 papillae) and 6 (y-axis: 
0 - 250 papillae) show papillation using various mutants plus 
IE and OE controls and the EK54/MA56 enzyme. Fig. 7 (y-axis: 0 
- 250 papillae) , shows papillation when the same mutant 
sequences were tested against the wild type (more properly, 
MA56) transposase. The 10A/11T/12A mutant (SEQ ID NO: 9) 
yielded significantly more papillae (approximately 3000) in a 
shorter time (68 hours) with ED54/MA56 transposase than was 
observed even after 90 hours with the WT OE (approximately 
1500) . A single OE-like nucleotide at position 15 on an IE- 
like background also increased papillation frequency. 

In vivo transposition frequency was also quantitated in a 
tetracycline-resistance assay using two sequences having high 
levels of hyperpapillation. These sequences were 5>- 
CTGTCTCTTATACACATCT-3 ' (SEQ ID NO: 8), which differs from the 
wild type OE sequence at positions 4, 17, and 18, counting from 
the 5 1 end, and 5 ' -CTGTCTCTTATACAGATCT- 3 ' (SEQ ID NO: 9), which 
differs from the wild type OE at positions 4, 15, 17, and 18. 
These sequences are considered the preferred mutant termini in 
an assay using a transposase that contains EK54/MA56 or a 
transposase that contains MA56 . Each sequence was separately 
engineered into pRZTLl in place of the plasmid' s two wild type 
OE sequences. A PCR-amplif ied fragment containing the desired 
ends flanking the kanamycin resistance gene was readily cloned 
into the large Hindlll fragment of pRZTLl . The resulting 
plasmids are identical to pRZTLl except at the indicated 
termini. For comparison, pRZTLl and a derivative of pRZTLl 
containing two wild type IE sequences were also tested. In the 
assay, JCMlOl/pOXgen cells were co- transformed with a test 
plasmid (pRZTLl or derivative) and a high copy number amp' 
plasmid that encoded either the EK54/MA56 transposase or wild 
type (MA56) transposase. The host cells become tetracycline 



-27- 



SUBSTITUTE SHEET (RULE 26) 



9810077A1 I > 



WO 98/10077 PCT/US97/15941 

resistant only when a transposition event brings the Tet r gene 
into downstream proximity with a suitable transcriptional 
promoter elsewhere on a plasmid or on the chromosome. The 
total number of cells that received the test plasmids was 
determined by counting chloramphenicol resistant, ampicillin 
resistant colonies. Transposition frequency was calculated by 
taking the ratio of tet r /cam r amp r colonies. Approximately 40 to 
60 fold increase over wild type OE in in vivo transposition was 
observed when using either of the mutant termini and EK54/MA56 
transposase. Of the two preferred mutant termini, the one 
containing mutations at three positions relative to the wild 
type OE sequence yielded a higher increase . 

As is shown in Fig. 8, which plots the tested plasmid 
against the transposition frequency (x 10" 8 ) , little 
transposition was seen when the test plasmid included two IE 
termini. Somewhat higher transposition was observed when the 
test plasmid included two OE termini, particularly when the 
EK54/MA56 transposase was employed. In striking contrast, the 
combination of the EK54/MA56 transposase with either of the 
preferred selected ends (containing OE-like bases only at 
positions 10, 11, and 12, or positions 10, 11, 12, and 15) 
yielded a great increase in in vivo transposition over wild 
type OE termini. 

The preferred hyperactive mutant terminus having the most 
preferred synthetic terminus sequence 5 ' -CTGTCTCTTATACACATCT-3 1 
(SEQ ID NO: 8) was provided in place of both WT OE termini in 
pRZTLl (Fig. 4) and was tested in the in vitro transposition 
assay of the present invention using the triple mutant 
transposase described herein. This mutant terminus was chosen 
for further in vitro analysis because its transposition 
frequency was higher than for the second preferred synthetic 
terminus and because it has no dam methylation sites, so dam 
methylation no longer affects transposition frequency. In 
contrast the 4/15/17/18 mutant does have a dam methylation 
site . 

In a preliminary experiment, CHAPS was eliminated from the 
reaction, but the pre -incubation step was used. The reaction 
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5 was pre-incubated for 1 hour at 20°C, then diluted two times, 
and then incubated for 3 hours at 37°C. About 0.5 fxg of DNA 
and 0.4^9 of transposase was used. The transposition products 
were observed on a gel. With the mutant termini, very little 
of the initial DNA was observed. Numerous bands representing 
10 primary and secondary transposition reaction products were 
observed. The reaction mixtures were transformed into DH5a 
cells and were plated on chloramphenicol-, tetracycline-, or 
kanamycin-containing plates. 

Six hundred forty chloramphenicol -resistant colonies were 
15 observed. Although these could represent unreacted plasmid, 

all such colonies tested (n=12) were sensitive to kanamycin, 
which indicates a loss of donor backbone DNA. All twelve 
colonies also included plasmids of varied size; 9 of the 12 
were characterized as deletion- inversions, the remaining 3 were 
20 simple deletions. Seventy nine tetracycline-resistant 

colonies were observed, which indicated an activation of the 
tet r gene by transposition. 

Eleven kanamycin resistant colonies were observed. This 
indicated a low percentage of remaining plasmids carrying the 
25 donor backbone DNA. 

In a second, similar test, about 1 of plasmid DNA and 
0.2 Atg transposase were used. In this test, the reaction was 
incubated without CHAPS at 37°C for 3 hours without pre- 
incubation or dilution. Some initial DNA was observed in the 
30 gel after the 3 hour reaction. After overnight incubation, 
only transposition products were observed. 

The 3 hour reaction products were transformed into DH5or 
cells and plated as described. About 50% of the 
chloramphenicol resistant colonies were sensitive to kanamycin 
35 and were presumably transposition products. 

The invention is not intended to be limited to the 
foregoing examples, but to encompass all such modifications and 
variations as come within the scope of the appended claims. 
It is envisioned that, in addition to the uses specifically 
4 0 noted herein, other applications will be apparent to the 
skilled molecular biologist. In particular, methods for 
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introducing desired mutations into prokaryotic or eukaryotic 
DNA are very desirable. For example, at present it is 
difficult to knock out a functional eukaryotic gene by 
homologous recombination with an inactive version of the gene 
that resides on a plasmid. The difficulty arises from the need 
to flank the gene on the plasmid with extensive upstream and 
downstream sequences. Using this system, however, an 
inactivating transposable element containing a selectable 
marker gene (e.g., neo) can be introduced in vitro into a 
plasmid that contains the gene that one desires to inactivate. 
15 After transposition, the products can be introduced into 

suitable host cells. Using standard selection means, one can 
recover only cell colonies that contain a plasmid having the 
transposable element. Such plasmids can be screened, for 
example by restriction analysis, to recover those that contain 
20 a disrupted gene. Such clones can then be introduced directly 
into eukaryotic cells for homologous recombination and 
selection using the same marker gene. 

Also, one can use the system to readily insert a PCR- 
amplified DNA fragment into a vector, thus avoiding traditional 
cloning steps entirely. This can be accomplished by (1) 
providing suitable a pair of PCR primers containing OE termini 
adjacent to the sequence- specif ic parts of the primers, (2) 
performing standard PCR amplification of a desired nucleic acid 
fragment, (3) performing the in vitro transposition reaction of 
the present invention using the double -stranded products of PCR 
amplification as the donor DNA. 
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5 SEQUENCE LISTING 

<1) GENERAL INFORMATION : 

(i) APPLICANT: Reznikoff, William S 

Gorysin, Igor Y 
Zhou, Hong 

10 (ii ) TITLE OF INVENTION: System for In Vitro Transposition 

(iii) NUMBER OF SEQUENCES: 11 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Quarles & Brady 

IS (B) STREET: 1 South Pinckney Street 

(C) CITY: Madison 

(D) STATE: WI 

(E) COUNTRY: USA 

(F) ZIP: 53703 

20 (v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

<C) OPERATING SYSTEM: PC -DOS /MS -DOS ^ „ 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

25 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vui) ATTORNEY /AGENT INFORMATION: 
30 (A) NAME: Berson, Bennett J 

(B) REGISTRATION NUMBER: 37094 

(C) REFERENCE / DOCKET NUMBER: 960296.94142 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 608/251-5000 
35 (B) TELEFAX: 608-251-9166 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1534 base pairs 

(B) TYPE: nucleic acid 
40 (C) STRANDEDNESS : double 

< D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: other nucleic acid ^-4^ -rrm 

(A) DESCRIPTION: /desc = "Gene encoding modified Tn5 

transposase enzyme " 

45 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 93.. 1523 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
CTGACTCTTA TACACAAGTA GCGTCCTGAA CGGAACCTTT CCCGTTTTCC AGGATCTGAT 6 0 

50 CTTCCATGTG ACCTCCTAAC ATGGTAACGT TC ATG ATA ACT TCT GCT CTT CAT 

1 5 
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5 CGT GCG GCC GAC TGG GCT AAA TCT GTG TTC TCT TCG GCG GCG CTG GGT 161 

Arg Ala Ala Asp Trp Ala Lys Ser Val Phe Ser Ser Ala Ala Leu Gly 
10 15 20 

GAT CCT CGC CGT ACT GCC CGC TTG GTT AAC GTC GCC GCC CAA TTG GCA 209 
Asp Pro Arg Arg Thr Ala Arg Leu Val Asn Val Ala Ala Gin Leu Ala 
10 25 30 35 

AAA TAT TCT GGT AAA TCA ATA ACC ATC TCA TCA GAG GGT AGT AAA GCC 2 57 

Lys Tyr Ser Gly Lys Ser He Thr He Ser Ser Glu Gly Ser Lys Ala 
40 45 50 55 



GCC CAG GAA GGC GCT TAC CGA TTT ATC CGC AAT CCC AAC GTT TCT GCC 305 
15 Ala Gin Glu Gly Ala Tyr Arg Phe He Arg Asn Pro Asn val Ser Ala 

60 65 70 

GAG GCG ATC AGA AAG GCT GGC GCC ATG CAA ACA GTC AAG TTG GCT CAG 353 
Glu Ala He Arg Lys Ala Gly Ala Met Gin Thr Val Lys Leu Ala Gin 
75 80 85 

20 GAG TTT CCC GAA CTG CTG GCC ATT GAG GAC ACC ACC TCT TTG AGT TAT 401 

Glu Phe Pro Glu Leu Leu Ala He Glu Asp Thr Thr Ser Leu Ser Tyr 
90 95 100 

CGC CAC CAG GTC GCC GAA GAG CTT GGC AAG CTG GGC TCT ATT CAG GAT 44 9 

Arg His Gin Val Ala Glu Glu Leu Gly Lys Leu Gly Ser He Gin Asp 
25 105 HO I 15 

AAA TCC CGC GGA TGG TGG GTT CAC TCC GTT CTC TTG CTC GAG GCC ACC 497 
Lys Ser Arg Gly Trp Trp Val His Ser Val Leu Leu Leu Glu Ala Thr 
120 125 130 135 

ACA TTC CGC ACC GTA GGA TTA CTG CAT CAG GAG TGG TGG ATG CGC CCG 545 
30 Thr Phe Arg Thr Val Gly Leu Leu His Gin Glu Trp Trp Met Arg Pro 

140 145 15° 

GAT GAC CCT GCC GAT GCG GAT GAA AAG GAG AGT GGC AAA TGG CTG GCA 593 
Asp Asp Pro Ala Asp Ala Asp Glu Lys Glu Ser Gly Lys Trp Leu Ala 
1 55 " 160 165 

35 GCG GCC GCA ACT AGC CGG TTA CGC ATG GGC AGC ATG ATG AGC AAC GTG 641 

Ala Ala Ala Thr Ser Arg Leu Arg Met Gly Ser Met Met Ser Asn Val 
170 175 1B0 

ATT GCG GTC TGT GAC CGC GAA GCC GAT ATT CAT GCT TAT CTG CAG GAC 689 
He Ala Val Cys Asp Arg Glu Ala Asp He His Ala Tyr Leu Gin Asp 
40 185 190 195 

AGG CTG GCG CAT AAC GAG CGC TTC GTG GTG CGC TCC AAG CAC CCA CGC 737 
Arg Leu Ala His Asn Glu Arg Phe Val Val Arg Ser Lys His Pro Arg 
200 205 210 215 

AAG GAC GTA GAG TCT GGG TTG TAT CTG ATC GAC CAT CTG AAG AAC CAA 785 
45 Lys Asp Val Glu Ser Gly Leu Tyr Leu He Asp His Leu Lys Asn Gin 

220 225 230 

CCG GAG TTG GGT GGC TAT CAG ATC AGC ATT CCG CAA AAG GGC GTG GTG 833 
Pro Glu Leu Gly Gly Tyr Gin He Ser He Pro Gin Lys Gly Val Val 
235 240 245 

50 GAT AAA CGC GGT AAA CGT AAA AAT CGA CCA GCC CGC AAG GCG AGC TTG 881 

Asp Lys Arg Gly Lys Arg Lys Asn Arg Pro Ala Arg Lys Ala Ser Leu 
250 255 260 

AGC CTG CGC AGT GGG CGC ATC ACG CTA AAA CAG GGG AAT ATC ACG CTC 929 
Ser Leu Arg Ser Gly Arg He Thr Leu Lys Gin Gly Asn He Thr Leu 
55 265 270 275 
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GCG GTC AGG CTG 11 A »_i<- ««« - - . . 

Ala Val Arg Leu Leu Gin Leu Arg Glu Ser Phe Thr Pro Pro Gin Ala 



TPC GCA GAA ACG GTG CTG ACC CCG GAT GAA TGT CAG CTA CTG GGC TAT 
sir aS Si «2 Val Leu Thr Pro Asp Glu Cys Gin Leu Leu Gly Tyr 
395 400 * u:> 
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5 AAC GCG GTG CTG GCC GAG GAG ATT AAC CCG CCC AAG GGT GAG ACC CCG 

Asn Ala Val Leu Ala Glu Glu lie Asn Pro Pro Lys Gly Glu Tnr Pro 
280 285 290 295 

TTG AAA TGG TTG TTG CTG ACC GGC GAA CCG GTC GAG TCG CTA GCC CAA 
™ i?i Trp Leu Leu Leu Thr Gly Glu Pro Val Glu Ser Leu Ala Gin 
1Q 300 305 JAO 

GCC TTG CGC GTC ATC GAC ATT TAT ACC CAT CGC TGG CGG ATC GAG GAG 
Ala leu Arg Val He Asp He Tyr Thr His Arg Trp Arg lie Glu Glu 
315 320 3 " 

TTC CAT AAG GCA TGG AAA ACC GGA GCA GGA GCC GAG AGG CAA CGC ATG 1121 
15 His %s Si Trp Lys Thr Gly Ala Gly Ala Glu Arg Gin Arg Met 

330 335 3 

GAG GAG CCG GAT AAT CTG GAG CGG ATG GTC TCG ATC CTC TCG TTT GTT 
Glu Glu Pro Asp Asn Leu Glu Arg Met Val Ser lie ueu 

345 3 50 355 

GCG GTC AGG CTG TTA CAG CTC AGA GAA AGC TTC ACG CCG CCG CAA GCA 



2S S SK S?y S Si tys g£ K G?u His « Su fr Sn ~ ' 

25 380 3 



1313 



1409 



1457 



CTG GAC AAG GGA AAA CGC AAG CGC AAA GAG AAA GCA GGT AGC TTG CAG 1361 
3 0 2S Asp 52 35 iys Arg Lys Arg Lys Glu Lys Ala Gly Ser Leu Gin 

410 415 

s s 3s si? s s s s s ss s s as se ^ 
» s; S £5 s s s ss s ss s si s s as 

440 445 450 

CTG CAA AGT AAA CTG GAT GGC TTT CTT GCC GCC AAG GAT CTG ATG GCG 
22 Si Ser Lys Leu Asp Gly Phe Leu Ala Ala Lys Asp Leu Met Ala 
460 465 

CAG GGG ATC AAG ATC TGA TCAAGAGACA G 
Gin Gly He Lys He * 
475 

(2) INFORMATION FOR SEQ ID NO: 2: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 477 amino acids 

(B) TYPE: amino acid 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met lie Thr Ser Ala Leu His Arg Ala Ala Asp Trp Ala Lys Ser Val 
1 5 10 
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5 Phe Ser Ser Ala Ala Leu Gly Asp Pro Arg Arg Thr Ala Arg Leu Val 

20 25 30 

Asn Val Ala Ala Gin Leu Ala Lys Tyr Ser Gly Lys Ser lie Thr He 
35 40 45 

Ser Ser Glu Gly Ser Lys Ala Ala Gin Glu Gly Ala Tyr Arg Phe He 
10 50 55 60 

Arg Asn Pro Asn Val Ser Ala Glu Ala He Arg Lys Ala Gly Ala Met 
65 70 75 80 

Gin Thr Val Lys Leu Ala Gin Glu Phe Pro Glu Leu Leu Ala He Glu 
85 90 95 

15 Asp Thr Thr Ser Leu Ser Tyr Arg His Gin Val Ala Glu Glu Leu Gly 

100 105 110 

Lys Leu Gly Ser He Gin Asp Lys Ser Arg Gly Trp Trp Val His Ser 
115 120 125 

Val Leu Leu Leu Glu Ala Thr Thr Phe Arg Thr Val Gly Leu Leu His 
20 130 135 140 

Gin Glu Trp Trp Met Arg Pro Asp Asp Pro Ala Asp Ala Asp Glu Lys 
145 150 155 160 

Glu Ser Gly Lys Trp Leu Ala Ala Ala Ala Thr Ser Arg Leu Arg Met 
165 170 ^75 

25 Gly Ser Met Met Ser Asn Val He Ala Val Cys Asp Arg Glu Ala Asp 

180 185 19° 

He His Ala Tyr Leu Gin Asp Arg Leu Ala His Asn Glu Arg Phe Val 
195 200 205 

Val Arg Ser Lys His Pro Arg Lys Asp Val Glu Ser Gly Leu Tyr Leu 
30 210 215 220 

He Ast> His Leu Lys Asn Gin Pro Glu Leu Gly Gly Tyr Gin He Ser 
225 * 230 235 240 

He Pro Gin Lys Gly Val Val Asp Lys Arg Gly Lys Arg Lys Asn Arg 
245 250 255 

35 Pro Ala Arg Lys Ala Ser Leu Ser Leu Arg Ser Gly Arg He Thr Leu 

260 265 270 

Lvs Gin Gly Asn He Thr Leu Asn Ala Val Leu Ala Glu Glu He Asn 
275 280 285 

Pro Pro Lys Gly Glu Thr Pro Leu Lys Trp Leu Leu Leu Thr Gly Glu 
40 290 " 295 300 

Pro Val Glu Ser Leu Ala Gin Ala Leu Arg Val He Asp He Tyr Thr 
305 310 315 320 

His Ara Trp Arg He Glu Glu Phe His Lys Ala Trp Lys Thr Gly Ala 
325 330 335 

45 Gly Ala Glu Arg Gin Arg Met Glu Glu Pro Asp Asn Leu Glu Arg Met 

340 345 350 

Val Ser He Leu Ser Phe Val Ala Val Arg Leu Leu Gin Leu Arg Glu 
355 360 365 
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5 Ser Phe Thr Pro Pro Gin Ala Leu Arg Ala Gin Gly Leu Leu Lys Glu 

370 375 380 

Ala Glu His Val Glu Ser Gin Ser Ala Glu Thr Val Leu Thr Pro Asp 
365 390 395 400 

Glu Cys Gin Leu Leu Gly Tyr Leu Asp Lys Gly Lys Arg Lys Arg Lys 
!Q 405 410 41b 

Glu Lys Ala Gly Ser Leu Gin Trp Ala Tyr Met Ala He Ala Arg Leu 
420 425 430 

Gly Gly Phe Met Asp Ser Lys Arg Thr Gly He Ala Ser Trp Gly Ala 
435 ** 440 445 

15 Leu Trp Glu Gly Trp Glu Ala Leu Gin Ser Lys Leu Asp Gly Phe Leu 

450 * " 455 460 

Ala Ala Lys Asp Leu Met Ala Gin Gly He Lys He * 
465 * 470 475 

(2) INFORMATION FOR SEQ ID NO: 3: 

20 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5838 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 

25 (ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Plasmid DNA" 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: pRZTLl 

(ix) FEATURE: 
30 (A) NAME /KEY : insert ion_seq 

<B) LOCATION: 1. .19 

(ix) FEATURE: 

(A) NAME /KEY : CDS 
fBi LOCATION* 77. .1267 
35 (D ) OTHER INFORMATION: /function= "tetracycline resistance" 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

S! ^^?S^iSSr^^c?ion= 9 -cilo r a mP henicol resistance" 

4 0 (ix) FEATURE: 

(A) NAME /KEY : insert ion_seq 

(B) LOCATION: 4564.. 4582 

(ix) FEATURE: 

(A) NAME /KEY : CDS 
45 (B) LOCATION: 4715.. 5530 

*° j D) OTHER INFORMATION: /function- "kanamycin resistance' 



50 



60 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
CTGACTCTTA TACACAAGTA AGCTTTAATG CGGTAGTTTA TCACAGTTAA ATTGCTAACG 

CAGTCAGGCA CCGTGT ATG AAA TCT AAC AAT GCG CTC ATC GTC ATC CTC 109 
Met Lys Ser Asn Asn Ala Leu He Val lie Leu 
460 485 
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GGC ACC GTC ACC CTG GAT GCT GTA GGC ATA GGC TTG GTT ATG CCG GTA 
fy ?hr S5 5hr Leu Asp Ala Val Gly He Gly Leu Val Met Pro Val 
y 490 495 500 



50 CTT GCG GTA TTC GGA ATC TTG CAC GCC CTC GCT CAA GCC TTC GTC ACT 

Leu Ala val Phe Gly He Leu His Ala Leu Ala Gin Ala Phe vai Tnr 
730 735 740 

GGT CCC GCC ACC AAA CGT TTC GGC GAG AAG CAG GCC ATT ATC GCC GGC 
Gly Pro Ala Thr Lys Arg Phe Gly Glu Lys Gin Ala He He Ala Gly 
55 745 750 755 
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CTG CCG GGC CTC TTG CGG GAT ATC GTC CAT TCC GAC AGC ATC GCC ACT 205 
22 5ro fly 2u Leu Arg Asp He Val His Ser Asp Ser He Ala Ser 
10 505 510 515 520 

CAC TAT GGC GTG CTG CTA GCG CTA TAT GCG TTG ATG CAA TTT CTA TGC 253 
HiS t5t Gly val Leu Leu Ala Leu Tyr Ala Leu Met Gin Phe Leu Cys 
* 1 525 530 535 

GCA CCC GTT CTC GGA GCA CTG TCC GAC CGC TTT GGC CGC CGC CCA GTC 301 
15 aS Pro Vai Leu Gly Ala Leu Ser Asp Arg Phe Gly Arg Arg Pro Val 

540 545 : " u 

CTG CTC GCT TCG CTA CTT GGA GCC ACT ATC GAC TAC GCG ATC ATG GCG 349 
22 22 SI Se? Leu Leu Gly Ala Thr He Asp Tyr Ala He Met Ala 
555 560 565 

20 ACC ACA CCC GTC CTG TGG ATC CTC TAC GCC GGA CGC ATC GTG GCC GGC 3 97 

Thr Tnr Pro Val Leu Trp He Leu Tyr Ala Gly Arg He Val Ala Gly 
570 575 580 

ATC ACC GGC GCC ACA GGT GCG GTT GCT GGC GCC TAT ATC GCC GAC ATC 44S 
He Thr Giy Ala Thr Gly Ala Val Ala Gly Ala Tyr He Ala Asp He 
25 585 590 595 

ACC GAT GGG GAA GAT CGG GCT CGC CAC TTC GGG CTC ATG AGC GCT TGT 
Thr Asp Gly Glu Asp Arg Ala Arg His Phe Gly Leu Met Ser Ala Cys 
605 610 

TTC GGC GTG GGT ATG GTG GCA GGC CCC GTG GCC GGG GGA CTG TTG GGC 
30 Se fly Val SJ Met Val Ala Gly Pro Val Ala Gly Gly Leu Leu Gly 

620 625 

GCC ATC TCC TTG CAT GCA CCA TTC CTT GCG GCG GCG GTG CTC AAC GGC 589 
III He Ser Leu His Ala Pro Phe Leu Ala Ala Ala Val Leu Asn Gly 
635 640 64 

^ CTC AAC CTA CTA CTG GGC TGC TTC CTA ATG CAG GAG TCG CAT AAG GGA 637 

35 22 a52 25 25 2u Gly cys Phe Leu Met Gin Glu ser His Lys Gly 

650 655 660 

GAG CGT CGA CCG ATG CCC TTG AGA GCC TTC AAC CCA GTC AGC TCC TTC 
Glu Arg Arg Pro Met Pro Leu Arg Ala Phe Asn Pro Val Ser Ser Phe 
40 665 ~ 670 675 

CGG TGG GCG CGG GGC ATG ACT ATC GTC GCC GCA CTT ATG ACT GTC TTC 733 
Arg Trp Ala Arg Gly Met Thr He Val Ala Ala Leu Met Thr Val Phe 
685 690 

TTT ATC ATG CAA CTC GTA GGA CAG GTG CCG GCA GCG CTC TGG GTC ATT 781 
45 Se He Met Gin Leu Val Gly Gin Val Pro Ala Ala Leu Trp Val He 

700 705 

TTC GGC GAG GAC CGC TTT CGC TGG AGC GCG ACG ATG ATC GGC CTG TCG 829 
£2 fly Gm 2p Arg Phe Arg Trp Ser Ala Thr Met lie Gly Leu Ser 
71 5 720 72;> 



493 



541 



685 



877 
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5 ATG GCG GCC GAC GCG CTG GGC TAG GTC TTG CTG GCG TTC GCG ACG CGA 973 

Met Ala Ala Asp Ala Leu Gly Tyr Val Leu Leu Ala Phe Ala Thr Arg 
765 770 775 

GGC TGG ATG GCC TTC CCC ATT ATG ATT CTT CTC GCT TCC GGC GGC ATC 1021 
Gly Trp Met Ala Phe Pro lie Met He Leu Leu Ala Ser Gly Gly He 
10 780 785 790 

GGG ATG CCC GCG TTG CAG GCC ATG CTG TCC AGG CAG GTA GAT GAC GAC 1069 
Glv Met Pro Ala Leu Gin Ala Met Leu Ser Arg Gin Val Asp Asp Asp 
795 800 805 

CAT CAG GGA CAG CTT CAA GGA TCG CTC GCG GCT CTT ACC AGC CTA ACT 1117 
15 His Gin Gly Gin Leu Gin Gly Ser Leu Ala Ala Leu Thr Ser Leu Thr 

810 815 820 

TCG ATC ACT GGA CCG CTG ATC GTC ACG GCG ATT TAT GCC GCC TCG GCG 1165 
Ser He Thr Gly Pro Leu He Val Thr Ala He Tyr Ala Ala Ser Ala 
825 8 30 S35 840 



20 



AGC ACA TGG AAC GGG TTG GCA TGG ATT GTA GGC GCC GCC CTA TAC CTT 1213 
Ser Thr Trp Asn Gly Leu Ala Trp He Val Gly Ala Ala Leu Tyr Leu 
- - 850 S5 =» 



845 



GTC TGC CTC CCC GCG TTG CGT CGC GGT GCA TGG AGC CGG GCC ACC TCG 1261 
Val Cys Leu Pro Ala Leu Arg Arg Gly Ala Trp Ser Arg Ala Thr Ser 
25 860 865 8 70 

ACC TGA ATGGAAGCCG GCGGCACCTC GCTAACGGAT TCACCACTCC AAGAATTGGA 1317 
Thr * 

GCCAATCAAT TCTTGCGGAG AACTGTGAAT GCGCAAACCA ACCCTTGGCA GAACATATCC 1377 

3 0 ATCGCGTCCG CCATCTCCAG CAGCCGCACG CGGCGCATCT CGGGCAGCGT TGGGTCCTGG 1437 

CCACGGGTGC GCATGATCGT GCTCCTGTCG TTGAGGACCC GGCTAGGCTG GCGGGGTTGC 1497 

CTTACTGGTT AGCAGAATGA ATCACCGATA CGCGAGCGAA CGTGAAGCGA CTGCTGCTGC 1557 

AAAACGTCTG CGACCTGAGC AACAACATGA ATGGTCTTCG GTTTCCGTGT TTCGTAAAGT 1617 

CTGGAAACGC GGAAGTCCCC TACGTGCTGC TGAAGTTGCC CGCAACAGAG AGTGGAACCA 1677 

3 5 ACCGGTGATA CCACGATACT ATGACTGAGA GTCAACGCCA TGAGCGGCCT CATTTCTTAT 1737 

TCTGAGTTAC AACAGTCCGC ACCGCTGTCC GGTAGCTCCT TCCGGTGGGC GCGGGGCATG 1797 

ACTATCGTCG CCGCACTTAT GACTGTCTTC TTTATCATGC AACTCGTAGG ACAGGTGCCG 18 S7 

GCAGCGCCCA ACAGTCCCCC GGCCACGGGG CCTGCCACCA TACCCACGCC GAAACAAGCG 1917 

CCCTGCACCA TTATGTTCCG GATCTGCATC GCAGGATGCT GCTGGCTACC CTGTGGAACA 1977 

4 0 CCTACATCTG TATTAACGAA GCGCTAACCG TTTTTATCAG GCTCTGGGAG GCAGAATAAA 2037 

TGATCATATC GTCAATTATT ACCTCCACGG GGAGAGCCTG AGCAAACTGG CCTCAGGCAT 2097 

TTGAGAAGCA CACGGTCACA CTGCTTCCGG TAGTCAATAA ACCGGTAAAC CAGCAATAGA 2157 

CATAAGCGGC TATTTAACGA CCCTGCCCTG AACCGACGAC CGGGTCGAAT TTGCTTTCGA 2217 

ATTTCTGCCA TTCATCCGCT TATTATCAAT TATTCAGGCG TAGCACCAGG CGTTTAAGGG 2277 

4 5 CACCAATAAC TGCCTTAAAA AAATTACGCC CCGCCCTGCC ACTCATCGCA GTACTGTTGT 2337 

AATTCATTAA GCATTCTGCC GACATGGAAG CCATCACAGA CGGCATGATG AACCTGAATC 2397 
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5 GCCAGCGGCA TCAGCACCTT GTCGCCTTGC GTATAATATT TGCCCATGGT GAAAACGGGG 

GCGAAGAAGT TGTCCATATT GGCCACGTTT AAATCAAAAC TGGTGAAACT CACCCAGGGA 
TTGGCTGAGA CGAAAAACAT ATTCTCAATA AACCCTTTAG GGAAATAGGC CAGGTTTTCA 
CCGTAACACG CCACATCTTG CGAATATATG TGTAGAAACT GCCGGAAATC GTCGTGGTAT 
TCACTCCAGA GCGATGAAAA CGTTTCAGTT TGCTCATGGA AAACGGTGTA ACAAGGGTGA 
10 ACACTATCCC ATATCACCAG CTCACCGTCT TTCATTGCCA TACGGAATTC CGGATGAGCA 

TTCATCAGGC GGGCAAGAAT GTGAATAAAG GCCGGATAAA ACTTGTGCTT ATTTTTCTTT 
ACGGTCTTTA AAAAGGCCGT AATATCCAGC TGAACGGTCT GGTTATAGGT ACATTGAGCA 
ACTGACTGAA ATGCCTCAAA ATGTTCTTTA CGATG CCATT GGGATATATC AACGGTGGTA 
TATCCAGTGA TTTTTTTCTC CATTTTAGCT TCCTTAGCTC CTGAAAATCT CGATAACTCA 
15 AAAAATACGC CCGGTAGTGA TCTTATTT C A TTATGGTGAA AGTTGGAACC TCTTACGTGC 

CGATCAACGT CTCATTTTCG CCAAAAGTTG GCCCAGGGCT TCCCGGTATC AACAGGGACA 
CCAGGATTTA TTTATTCTGC GAAGTGATCT TCCGTCACAG GTATTTATTC GGCGCAAAGT 
GCGTCGGGTG ATGCTGCCAA CTTACTGATT TAGTGTATGA TGGTGTTTTT GAGGTGCTCC 
AGTGGCTTCT GTTTCTATCA GCTGTCCCTC CTGTTCAGCT ACTGACGGGG TGGTGCGTAA 
20 CGGCAAAAGC ACCGCCGGAC ATCAGCGCTA GCGGAGTGTA TACTGG CTTA CTATGTTGGC 

ACTGATGAGG GTGTCAGTGA AGTGCTTCAT GTGGCAGGAG AAAAAAGGCT GCACCGGTGC 
GTCAGCAGAA TATGTGATAC AGGATATATT CCGCTTCCTC GCTCACTGAC TCGCTACGCT 
CGGTCGTTCG ACTGCGGCGA GCGGAAATGG CTTACGAACG GGGCGGAGAT TTCCTGGAAG 
ATGCCAGGAA GATACTTAAC AGGGAAGTGA GAGGGCCGCG GCAAAGCCGT TTTTCCATAG 
25 GCTCCGCCCC CCTGACAAGC ATCACGAAAT CTGACGCTCA AATCAGTGGT GGCGAAACCC 

GACAGGACTA TAAAGATACC AGGCGTTTCC CCTGGCGGCT CCCTCGTGCG CTCTCCTGTT 

ccTGccrrrc ggtttaccgg tgtcattccg ctgttatggc cgcgtttgtc tcattccacg 

CCTGACACTC AGTTCCGGGT AGGCAGTTCG CTCCAAGCTG GACTGTATGC ACGAACCCCC 
CGTTCAGTCC GACCGCTGCG CCTTATCCGG TAACTATCGT CTTGAGTCCA ACCCGGAAAG 
3 0 ACATGCAAAA GCACCACTGG CAGCAGCCAC TGGTAATTGA TTTAGAGGAG TTAGTCTTGA 

AGTCATGCGC CGGTTAAGGC TAAACTGAAA GGACAAGTTT TGGTGACTGC GCTCCTCCAA 
GCCAGTTACC TCGGTTCAAA GAGTTGGTAG CTCAGAGAAC CTTCGAAAAA CCGCCCTGCA 
AGGCGGTTTT TTCGTTTTCA GAG CAAG AG A TTACGCGCAG ACCAAAACGA TCTCAAGAAG 
ATCATCTTAT TAATCAGATA AAATATTTCT AGAGGTGAAC CATCACCCTA ATCAAGTTTT 
3 5 TTGGGGTCGA GGTGCCGTAA AGCACTAAAT CGGAACCCTA AAGGG ATGC C CCGATTTAGA 

GCTTGACGGG GAAAGCCGGC GAACGTGGCG AGAAAGGAAG GGAAGAAAGC GAAAGGAGCG 
GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG TAACCACCAC ACCCGCCGCG 
CTTAATGCGC CGCTACAGCG CCATTCGCCA TTCAGGCTGC G CAACTGTTG GGAAGGGCGA 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



TCGGTGCGGG CCTCTTCGCT ATTACGCCAG CTGGCGAAAG GGGGATGTGC TGCAAGGCGA 

TTAAGTTGGG TAACGCCAGG GTTTTCCCAG TCACGACGTT GTAAAACGAC GGCCAGTGCC 

AAGCTTACTT GTGTATAAGA GTCAGTCGAC CTGCAGGGGG GGGGGGGAAA GCCACGTTGT 

GTCTCAAAAT CTCTGATGTT ACATTGCACA AGATAAAAAT ATATCATCAT GAACAATAAA 

ACTGTCTGCT TACATAAACA GTAATACAAG GGGTGTT ATG AGC CAT ATT CAA CGG 

Met Ser His He Gin Arg 
1 5 

GAA ACG TCT TGC TCG AGG CCG CGA TTA AAT TCC AAC ATG GAT GCT GAT 
Glu Thr Ser Cys Ser Arg Pro Arg Leu Asn Ser Asn Met Asp Ala Asp 
10 15 20 

TTA TAT GGG TAT AAA TGG GCT CGC GAT AAT GTC GGG CAA TCA GGT GCG 
Leu Tyr Gly Tyr Lys Trp Ala Arg Asp Asn Val Gly Gin Ser Gly Ala 
25 30 35 

ACA ATC TAT CGA TTG TAT GGG AAG CCC GAT GCG CCA GAG TTG TTT CTG 
Thr He Tyr Arg Leu Tyr Gly Lys Pro Asp Ala Pro Glu Leu Phe Leu 
40 ~ 45 50 

AAA CAT GGC AAA GGT AGC GTT GCC AAT GAT GTT ACA GAT GAG ATG GTC 
Lys His Gly Lys Gly Ser Val Ala Asn Asp Val Thr Asp Glu Met Val 
55 " 60 65 70 

AGA CTA AAC TGG CTG ACG GAA TTT ATG CCT CTT CCG ACC ATC AAG CAT 
Arq Leu Asn Trp Leu Thr Glu Phe Met Pro Leu Pro Thr He Lys His 
75 80 85 

TTT ATC CGT ACT CCT GAT GAT GCA TGG TTA CTC ACC ACT GCG ATC CCC 
Phe He Arg Thr Pro Asp Asp Ala Trp Leu Leu Thr Thr Ala He Pro 
90 95 100 

GGG AAA ACA GCA TTC CAG GTA TTA GAA GAA TAT CCT GAT TCA GGT GAA 
Gly Lys Thr Ala Phe Gin Val Leu Glu Glu Tyr Pro Asp Ser Gly Glu 
105 HO il 5 

AAT ATT GTT GAT GCG CTG GCA GTG TTC CTG CGC CGG TTG CAT TCG ATT 
Asn He Val Asp Ala Leu Ala Val Phe Leu Arg Arg Leu His Ser He 
120 125 130 

CCT GTT TGT AAT TGT CCT TTT AAC AGC GAT CGC GTA TTT CGT CTC GCT 
Pro Val Cys Asn Cys Pro Phe Asn Ser Asp Arg Val Phe Arg Leu Ala 
135 140 145 150 

CAG GCG CAA TCA CGA ATG AAT AAC GGT TTG GTT GAT GCG AGT GAT TTT 
Gin Ala Gin Ser Arg Met Asn Asn Gly Leu Val Asp Ala Ser Asp Phe 
155 160 165 

GAT GAC GAG CGT AAT GGC TGG CCT GTT GAA CAA GTC TGG AAA GAA ATG 
Asp Asp Glu Arg Asn Gly Trp Pro Val Glu Gin Val Trp Lys Glu Met 
170 175 180 

CAT AAG CTT TTG CCA TTC TCA CCG GAT TCA GTC GTC ACT CAT GGT GAT 
His Lys Leu Leu Pro Phe Ser Pro Asp Ser Val Val Thr His Gly Asp 
185 190 195 

TTC TCA CTT GAT AAC CTT ATT TTT GAC GAG GGG AAA TTA ATA GGT TGT 
Phe Ser Leu Asp Asn Leu He Phe Asp Glu Gly Lys Leu He Gly Cys 
200 205 210 



ATT GAT GTT GGA CGA GTC GGA ATC GCA GAC CGA TAC CAG GAT CTT GCC 
He Asp Val Gly Arg Val Gly He Ala Asp Arg Tyr Gin Asp Leu Ala 
215 220 225 230 
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5 ATC CTA TGG AAC TGC CTC GGT GAG TTT TCT CCT TCA TTA CAG AAA CGG 5452 

lie Leu Trp Asn Cys Leu Gly Glu Phe Ser Pro Ser Leu Gin Lys Arg 
235 240 245 

CTT TTT CAA AAA TAT GGT ATT GAT AAT CCT GAT ATG AAT AAA TTG CAG 5500 
Leu Phe Gin Lys Tyr Gly He Asp Asn Pro Asp Met Asn Lys Leu Gin 
10 250 255 260 



30 



40 



TTT CAT TTG ATG CTC GAT GAG TTT TTC TAA TCAGAATTGG TTAATTGGTT 
Phe His Leu Met Leu Asp Glu Phe Phe * 
265 270 



Ala Leu Ser Asp Arg Phe Gly Arg Arg Pro Val Leu Leu Ala Ser Leu 
65 * 70 75 

Leu Gly Ala Thr He Asp Tyr Ala He Met Ala Thr Thr Pro Val Leu 
85 9° 95 

Trp lie Leu Tyr Ala Gly Arg He val Ala Gly He Thr Gly Ala Thr 
105 ■ LXU 



100 



Gly Ala Val Ala Gly Ala Tyr lie Ala Asp He Thr Asp Gly Glu Asp 

120 l« 



115 



Arg Ala Arg His Phe Gly Leu Met Ser Ala Cys Phe Gly Val Gly Met 

130 1 3S 
Val Ala Gly Pro Val Ala Gly Gly Leu Leu Gly Ala He Ser Leu His 
4 5 145 150 155 

Ala Pro Phe Leu Ala Ala Ala Val Leu Asn Gly Leu Asn Leu Leu Leu 
165 170 
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5550 



5838 



GTAACACTGG CAGAGCATTA CGCTGACTTG ACGGGACGGC GGCTTTGTTG AATAAATCGA 5610 

15 ACTTTTGCTG AGTTGAAGGA TCAGATCACG CATCTTCCCG ACAACGCAGA CCGTTCCGTG 5670 

G CAAAGCAAA AGTTCAAAAT CACCAACTGG TCCACCTACA ACAAAGCTCT CATCAACCGT 57 30 

GGCTCCCTCA CTTTCTGG CT GGATGATGGG GCGATTCAGG C CTGGTATG A GTCAGCAACA 5790 
CCTTCTTCAC GAGG CAGACC TCAGCGCCCC CCCCCCCCTG CAGGTCGA 
(2) INFORMATION FOR SEQ ID NO: 4: 

2 0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 3 97 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inea r 

(ii) MOLECULE TYPE: protein 

25 (xi) SEQUENCE DESCRIPTION : SEQ ID NO:4: 

Met Lys Ser Asn Asn Ala Leu He Val He Leu Gly Thr Val Thr Leu 
! 5 10 15 

Asp Ala Val Gly He Gly Leu Val Met Pro Val Leu Pro Gly Leu Leu 
20 25 30 

Arg Asp He Val His Ser Asp Ser He Ala Ser His Tyr Gly Val Leu 
35 40 45 

Leu Ala Leu Tyr Ala Leu Met Gin Phe Leu Cys Ala Pro Val Leu Gly 
50 55 60 
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5 Gly Cys Phe Leu Met Gin Glu Ser His Lys Gly Glu Arg Arg Pro Met 

180 185 190 

Pro Leu Arg Ala Phe Asn Pro Val Ser Ser Phe Arg Trp Ala Arg Gly 
195 200 205 

Met Thr He Val Ala Ala Leu Met Thr Val Phe Phe He Met Gin Leu 
10 210 215 220 

Val Gly Gin Val Pro Ala Ala Leu Trp Val He Phe Gly Glu Asp Arg 
225 230 235 240 

Phe Arg Trp Ser Ala Thr Met He Gly Leu Ser Leu Ala Val Phe Gly 
245 250 255 

15 He Leu His Ala Leu Ala Gin Ala Phe Val Thr Gly Pro Ala Thr Lys 

260 265 270 

Arq Phe Gly Glu Lys Gin Ala He He Ala Gly Met Ala Ala Asp Ala 
275 280 285 

Leu Gly Tyr Val Leu Leu Ala Phe Ala Thr Arg Gly Trp Met Ala Phe 
20 290 295 300 

Pro He Met He Leu Leu Ala Ser Gly Gly He Gly Met Pro Ala Leu 
305 310 315 320 

Gin Ala Met Leu Ser Arg Gin Val Asp Asp Asp His Gin Gly Gin Leu 
325 330 335 

25 Gin Gly Ser Leu Ala Ala Leu Thr Ser Leu Thr Ser He Thr Gly Pro 

340 345 350 

Leu He Val Thr Ala He Tyr Ala Ala Ser Ala Ser Thr Trp Asn Gly 
355 360 365 

Leu Ala Trp He Val Gly Ala Ala Leu Tyr Leu Val Cys Leu Pro Ala 
30 370 375 380 

Leu Arg Arg Gly Ala Trp Ser Arg Ala Thr Ser Thr * 
385 " 390 395 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 2 20 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
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Val 


Gin 
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40 


He 


Thr 


Ala 
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Leu 
45 


Lys 
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Lys 
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50 
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55 
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He 
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Glu Leu val He Trp Asp Ser Val His Pro Cys Tyr Thr Val Phe His 
85 9° 95 

Glu Gin Thr Glu Thr Phe Ser Ser Leu Trp Ser Glu Tyr His Asp Asp 
100 105 110 

Phe Arg Gin Phe Leu His He Tyr Ser Gin Asp Val Ala Cys Tyr Gly 
115 120 12b 

Glu Asn Leu Ala Tyr Phe Pro Lys Gly Phe He Glu Asn Met Phe Phe 
130 135 "0 

Val Ser Ala Asn Pro Trp Val Ser Phe Thr Ser Phe Asp Leu Asn Val 
145 150 I 55 XbU 

Ala Asn Met Asp Asn Phe Phe Ala Pro Val Phe Thr Met Gly Lys Tyr 
165 I 70 

Tyr Thr Gin Gly Asp Lys Val Leu Met Pro Leu Ala He Gin Val His 
180 

His Ala Val Cys Asp Gly Phe His Val Gly Arg Met Leu Asn Glu Leu 
195 200 205 

Gin Gin Tyr Cys Asp Glu Trp Gin Gly Gly Ala * 
210 215 220 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 272 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

Met Ser His He Gin Arg Glu Thr Ser Cys Ser Arg Pro Arg Leu Asn 
1 5 10 15 

Ser Asn Met Asp Ala Asp Leu Tyr Gly Tyr Lys Trp Ala Arg Asp Asn 
20 25 30 

Val Gly Gin Ser Gly Ala Thr He Tyr Arg Leu Tyr Gly Lys Pro Asp 
35 40 45 

Ala Pro Glu Leu Phe Leu Lys His Gly Lys Gly Ser Val Ala Asn Asp 

50 55 60 

Val Thr Asp Glu Met Val Arg Leu Asn Trp Leu Thr Glu Phe Met Pro 
65 70 75 

Leu Pro Thr He Lys His Phe He Arg Thr Pro Asp Asp Ala Trp Leu 
85 90 

Leu Thr Thr Ala He Pro Gly Lys Thr Ala Phe Gin Val Leu Glu Glu 
100 105 

Tyr Pro Asp Ser Gly Glu Asn He Val Asp Ala Leu Ala Val Phe Leu 
115 120 123 

Arg Arg Leu His Ser He Pro Val Cys Asn Cys Pro Phe Asn Ser Asp 
130 135 140 

Arg Val Phe Arg Leu Ala Gin Ala Gin Ser Arg Met Asn Asn Gly Leu 

150 155 
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Val Asp Ala Ser Asp Phe Asp Asp Glu Arg Asn Gly Trp Pro Val Glu 

170 1 /3 



165 



Gin Val Trp Lys Glu Met His Lys Leu Leu Pro Phe Ser Pro Asp Ser 
180 185 

Val Val Thr His Gly Asp Phe Ser Leu Asp Asn Leu He Phe Asp Glu 
195 200 205 

Gly Lys Leu He Gly Cys He Asp Val Gly Arg Val Gly He Ala Asp 
210 215 220 

Arg Tyr Gin Asp Leu Ala He Leu Trp Asn Cys Leu Gly Glu Phe Ser 
225 230 235 24U 

Pro Ser Leu Gin Lys Arg Leu Phe Gin Lys Tyr Gly He Asp Asn Pro 
245 250 

Asp Met Asn Lys Leu Gin Phe His Leu Met Leu Asp Glu Phe Phe * 
260 265 

(2) INFORMATION FOR SEQ ID NO: 7: 



<i) SEQUENCE CHARACTERISTICS: 
{ A) LENGTH: 19 base pairs 
(3) TYPE: nucleic acid 
(C) STRANDEDNESS: double 
:D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = »Tn5 wild type outside end* 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

CTGACTCTTA TACACAAGT 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Tn5 mutant outside end 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

CTGTCTCTTA TACACATCT 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
<B> TYPE: nucleic acid 
{ C ) STRANDEDNESS : double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Tn5 mutant outside end 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
CTGTCTCTTA TACAGATCT 



19 



19 



IS 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = »Tn5 wild type inside end" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CTGTCTCTTG ATCAGATCT 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19182 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: other nucleic acid „__„,, 

(A) DESCRIPTION: /desc = "Plasmid pRZ4196 

(ix) FEATURE: 

(A) NAME /KEY : repeat_unit 

(B) LOCATION: 94.. 112 

(D) OTHER INFORMATION : /note= "Wild type OE sequence" 

(ix) FEATURE: 

(A) NAME /KEY : repeat_unit 

(B) LOCATION: 12184.. 12225 

(D) OTHER INFORMATION: /note= "Cassette IE 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TTCCTGTAAC AATAGCAATA CCCCAAATAC CTAATGTAGT TCCAGCAAGC AAGCTAAAAA 
GTAAAGCAAC AACATAACTC ACCCCTGCAT CTGCTGACTC TTATACACAA GTAGCGTCCC 
GGGATCGGGA TCCCGTCGTT TTACAACGTC GTGACTGGGA AAACCCTGGC GTTACCCAAC 
TTAATCGCCT TGCAG C AC AT CCCCCTTTCG CCAGCTGGCG TAATAGCGAA GAGGCCCGCA 
CCGATCGCCC TTCCCAACAG TTGCGCAGCC TGAATGGCGA ATGGCGCTTT GCCTGGTTTC 
CGGCACCAGA AGCGGTGCCG GAAAGCTGGC TGGAGTGCGA TCTTCCTGAG GCCGATACTG 360 
TCGTCGTCCC CTCAAACTGG CAGATGCACG GTTACGATGC GCCCATCTAC ACCAACGTAA 
CCTATCCCAT TACGGTCAAT CCGCCGTTTG TTCCCACGGA GAATCCGACG GGTTGTTACT 
CGCTCACATT TAATGTTGAT GAAAGCTGGC TACAGGAAGG CCAGACGCGA ATTATTTTTG 
ATGGCGTTAA CTCGGCGTTT CATCTGTGGT GCAACGGGCG CTGGGTCGGT TACGGCCAGG 
ACAGTCGTTT GCCGTCTGAA TTTGACCTGA GCGCATTTTT ACGCGCCGGA GAAAACCGCC 
TCGCGGTGAT GGTGCTGCGT TGGAGTGACG GCAGTTATCT GGAAGATCAG GATATGTGGC 
GGATGAGCGG CATTTTCCGT GACGTCTCGT TGCTGCATAA ACCGACTACA CAAATCAGCG 
ATTTCCATGT TGCCACTCGC TTTAATGATG ATTTCAGCCG CGCTGTACTG GAGGCTGAAG 
TTCAGATGTG CGGCGAGTTG CGTGACTACC TACGGGTAAC AGTTTCTTTA TGGCAGGGTG 
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5 AAACGCAGGT CGCCAGCGGC ACCGCGCCTT TCGGCGGTGA AATTATCGAT GAGCGTGGTG 960 

GTTATGCCGA TCGCGTCACA CTACGTCTGA ACGTCGAAAA CCCGAAACTG TGGAGCGCCG 1020 

AAATCCCGAA TCTCTATCGT GCGGTGGTTG AACTGCACAC CGCCGACGGC ACGCTGATTG 10B0 

AAGCAGAAGC CTGCGATGTC GGTTTCCGCG AGGTGCGGAT TGAAAATGGT CTGCTGCTGC 1140 

TGAACGGCAA GCCGTTGCTG ATTCGAGGCG TTAACCGTCA CGAGCATCAT CCTCTGCATG 1200 

10 GTCAGGTCAT GGATGAGCAG ACGATGGTGC AGGATATCCT GCTGATGAAG CAGAACAACT 1260 

TTAACGCCGT GCGCTGTTCG CATTATCCGA ACCATCCGCT GTGGTACACG CTGTGCGACC 1320 

GCTACGGCCT GTATGTGGTG GATGAAGCCA ATATTGAAAC CCACGG CATG GTGCCAATGA 1380 

ATCGTCTGAC CGATGATCCG CGCTGGCTAC CGGCGATGAG CGAACGCGTA ACGCGAATGG 1440 

TGCAGCGCGA TCGTAATCAC CCGAGTGTGA TCATCTGGTC GCTGGGGAAT GAATCAGGCC 1500 

15 ACGGCGCTAA TCACGACGCG CTGTATCGCT GGATCAAATC TGTCGATCCT TCCCGCCCGG 1560 

TGCAGTATGA AGGCGGCGGA GCCGACACCA CGGCCACCGA TATTATTTGC CCGATGTACG 1620 

CGCGCGTGGA TGAAGACCAG CCCTTCCCGG CTGTGCCGAA ATGGTCCATC AAAAAATGGC 1680 

TTTCGCTACC TGGAGAGACG CGCCCGCTGA TCCTTTGCGA ATACGCCCAC GCGATGGGTA 1740 

ACAGTCTTGG CGGTTTCGCT AAATACTGGC AGGCGTTTCG TCAGTATCCC CGTTTACAGG 1800 

20 GCGGCTTCGT CTGGGACTGG GTGGATCAGT CGCTGATTAA ATATGATGAA AACGGCAACC 1860 

CGTGGTCGGC TTACGGCGGT GATTTTGGCG ATACGCCGAA CGATCGCCAG TTCTGTATGA 1920 

ACGGTCTGGT CTTTGCCGAC CGCACGCCGC ATCCAGCGCT GACGGAAGCA AAACACCAGC 1980 

AGCAGTTTTT CCAGTTCCGT TTATCCGGGC AAACCATCGA AGTGACCAGC GAATACCTGT 2040 

TCCGTCATAG CGATAACGAG CTCCTGCACT GGATGGTGGC GCTGGATGGT AAGCCGCTGG 2100 

25 CAAGCGGTGA AGTGCCTCTG GATGTCGCTC CACAAGGTAA ACAGTTGATT GAACTGCCTG 2160 

AACTACCGCA GCCGGAGAGC GCCGGGCAAC TCTGGCTCAC AGTACGCGTA GTG CAACCGA 2220 

ACGCGACCGC ATGGTCAGAA GCCGGGCACA TCAGCGCCTG GCAGCAGTGG CGTCTGGCGG 2280 

AAAACCTCAG TGTGACGCTC CCCGCCGCGT CCCACGCCAT CCCGCATCTG ACCACCAGCG 2340 

AAATGGATTT TTGCATCGAG CTGGGTAATA AGCGTTGGCA ATTTAACCGC CAGTCAGGCT 2 400 

30 TTCTTTCACA GATGTGGATT GGCGATAAAA AACAACTGCT GACGCCGCTG CGCGATCAGT 2460 

TCACCCGTGC ACCGCTGGAT AACGACATTG GCGTAAGTGA AGCGACCCGC ATTGACCCTA 2 520 

ACGCCTGGGT CGAACGCTGG AAGGCGGCGG GCCATTACCA GGCCGAAGCA GCGTTGTTGC 2580 

AGTGCACGGC AGATACACTT GCTGATGCGG TGCTGATTAC GACCGCTCAC GCGTGGCAGC 2640 

ATCAGGGGAA AACCTTATTT ATCAGCCGGA AAACCTACCG GATTGATGGT AGTGGTCAAA 2700 

35 TGGCGATTAC CGTTGATGTT GAAGTGGCGA GCGATACACC GCATCCGGCG CGGATTGGCC 2760 

TGAACTGCCA GCTGGCGCAG GTAGCAGAGC GGGTAAACTG GCTCGGATTA GGGCCGCAAG 2820 

AAAACTATCC CGACCGCCTT ACTGCCGCCT GTTTTGACCG CTGGGATCTG CCATTGTCAG 2880 

ACATGTATAC CCCGTACGTC TTCCCGAGCG AAAACGGTCT GCGCTGCGGG ACGCGCGAAT 2940 

-45- 

SUBSTtTUTE SHEET (RULE 26) 

BNSDOCID: <WO 9810077A1J_> 



WO 98/10077 



PCT/US97/15941 



5 TGAATTATGG CCCACACCAG TGGCGCGGCG ACTTCCAGTT CAACATCAGC CGCTACAGTC 

AACAGCAACT GATGGAAACC AGCCATCGCC ATCTG CTGCA CGCGGAAGAA GGCACATGGC 
TGAATATCGA CGGTTTCCAT ATGGGGATTG GTGGCGACGA CTCCTGGAGC CCGTCAGTAT 
CGGCGGATTC CAGCTGAGCG CCGGTCGCTA CCATTACCAG TTGGTCTGGT GTCAAAAATA 
ATAATAACCG GGCAGGCCAT GTCTGCCCGT ATTTCGCGTA AGGAAATCCA TTATGTACTA 
10 TTTAAAAAAC ACAAACTTTT GGATGTTCGG TTTATTCTTT TTCTTTTACT TTTTTAT CAT 

GGGAGCCTAC TTCCCGTTTT TCCCGATTTG GCTACATGAC ATCAACCATA TCAGCAAAAG 
TGATACGGGT ATTATTTTTG CCGCTATTTC TCTGTTCTCG CTATTATTCC AACCG CTGTT 
TGGTCTGCTT TCTGACAAAC TCGGGCTGCG CAAATACCTG CTGTGGATTA TTACCGGCAT 
GTTAGTGATG TTTGCGCCGT TCTTTATTTT TATCTTCGGG CCACTGTTAC AATACAACAT 
15 TTTAGTAGGA TCGATTGTTG GTGGTATTTA TCTAGGCTTT TGTTTTAACG CCGGTGCGCC 

AGCAGTAGAG GCATTTATTG AGAAAGTCAG CCGTCGCAGT AATTT CGAAT TTGGTCGCGC 
GCGGATGTTT GGCTGTGTTG GCTGGGCGCT GTGTGCCTCG ATTGTCGGCA TCATGTTCAC 
CATCAATAAT CAGTTTGTTT TCTGGCTGGG CTCTGG CTGT GCACTCATCC TCGCCGTTTT 
ACT CTTTTT C GCCAAAACGG ATGCGCCCTC TTCTGCCACG GTTGCCAATG CGGTAGGTGC 
20 CAACCATTCG GCATTTAGCC TTAAGCTGG C ACTGGAACTG TTCAGACAGC CAAAACTGTG 

GTTTTTGTCA CTGTATGTTA TTGGCGTTTC CTGCACCTAC GATGTTTTTG ACCAACAGTT 
TG CT AATTT C TTTACTTCGT TCTTTGCTAC CGGTGAACAG GGTACGCGGG TATTTGGCTA 
CGTAACGACA ATGGGCGAAT TACTTAACGC CTCGATTATG TTCTTTGCGC CACTGATCAT 
TAATCGCATC GGTGGGAAAA ACGCCCTGCT GCTGGCTGGC ACTATTATGT CTGTACGTAT 
25 TATTGGCTCA TCGTTCGCCA CCTCAGCGCT GGAAGTGGTT ATTCTGAAAA CGCTG CAT AT 

GTTTGAAGTA CCGTTCCTGC TGGTGGGCTG CTTTAAATAT ATTACCAGCC AGTTTGAAGT 
GCGTTTTTCA GCGACGATTT ATCTGGTCTG TTTCTGCTTC TTTAAGCAAC TGGCGATGAT 
TTTTATGTCT GTACTGGCGG GCAATATGTA TGAAAGCATC GGTTTCCAGG GCGCTTATCT 
GGTG CTGGGT CTGGTGGCGC TGGGCTTCAC CTTAATTTCC GTGTTCACGC TTAGCGGCCC 
30 CGGCCCGCTT TCCCTGCTGC GT CGTCAGGT GAATGAAGTC GCTTAAGCAA TCAATGTCGG 

ATGCGGCGCG ACGCTTATCC GACCAACATA TCATAACGGA GTG ATCG CAT TGAACATGCC 
AATGACCGAA AGAATAAGAG CAGGCAAGCT ATTTACCGAT ATGTGCGAAG GCTTACCGGA 
AAAAAGACTT CGTGGGAAAA CGTTAATGTA TGAGTTTAAT CACTCGCATC CATCAGAAGT 
TGAAAAAAGA GAAAGCCTGA TTAAAGAAAT GTTTGCCACG GTAGGGGAAA ACGCCTGGGT 
35 AGAACCG CCT GTCTATTTCT CTTACGGTTC CAACATCCAT ATAGGCCG C A ATTTTTATGC 

AAATTTCAAT TTAACCATTG TCGATGACTA CACGGTAACA ATCGGTGATA ACGTACTGAT 
TGCACCCAAC GTTACTCTTT CCGTTACGGG ACACCCTGTA CACCATGAAT TGAGAAAAAA 
CGGCGAGATG TACTCTTTTC CGATAACGAT TGGCAATAAC GTCTGGATCG GAAGTCATGT 
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GGTTATTAAT CCAGGCGTCA 
CACAAAAGAC ATTCCACCAA 
AATAAACGAC CGGGATAAGC 
AATTATAAAA ATTGCCTGAT 
TAGCCGCATC CGG CATGAAC 
ACAGCTGCGG AAAACGTACT 
ACAGCGCATG AAATGCCCAG 
AAAACCACGG GGCAAGCCCG 
TGCCAGCAAT AGCCGGTTGC 
CGCCGCCCAG ACCTAACCCA 
AGATAAAGCC GCAGAACCCC 
GATCCTGATG GCGAGCCATA 
TCAATG CCAG TAAGGAACCG 
GTAACCAGGC AATCAGGCTG 
TCCACGCGCG GGGAGTGAAT 
CGACCTCGCG GGCGCTTTGC 
CCAGGCGAGT GTTTGATACC 
ACCAAGCCCA CCGCCGCCCA 
CTGCTGAAAC CGCCGTTTAA 
CCACCAAGCA GTGCGCTGCT 
GCACCGACGG CAATCAGCAA 
TGAAGCCAGC TTCCGGC C AG 
CCGGACGGGA CGCTCCTGCG 
TGTCTTCCCG TTTTCCGCCT 
ACGGCGAGCT GCTCACCACC 
CGGCGATGCT GAAGGTCGCG 
GGTGGGACGG GCAGGGCGCC 
AGCGCGCGTC CGGGGCCGGG 
CTTGCAGGAT CTATGATTCC 
TCACATTAAG TGGTATTCAA 
CACGTAAAAT CTGTTGTGCG 
ACGTTGGAGC CGCATTATTT 
GTTACCGTGA AGTTACCATC 
TTTAAGTTGT TTTTCTAATC 



CCATCGGGGA 
ACGTCGTGGC 
ACTATTATTT 
ACGCTGCGCT 
AAAGCGCAGG 
GGTGCAAAAC 
TCCATCAGGT 
GCGATGATAA 
ACAGAGTGAT 
CACACCATCG 
ACCAGTTGTA 
GCAGGCATCA 
CTGTACTGCG 
GCGTAACCGC 
ACCACGCGAA 
CACCACCAGG 
AGGTTTCGCT 
TCAGAGCCGC 
TCACCGAAGC 
AAGCAGCAGC 
CAGACTGATG 
CGCCAGCCCG 
CCTGATACAG 
GAGGTCACTG 
CACTCGAGCT 
CGCATTCCCG 
GCCCGAGTCT 
GACCTTGCAC 
CTTTGTCAAC 
TATTTTCATG 
TGTTTAGATT 
TCGCTTTATG 
ACGGAAAAAG 
CG CATATG AT 



TAATTCTGTT 
GGCTGGCGTT 
CAAAGATTAT 
TATCAGGCCT 
AACAAGCGTC 
GCAGGGTTAT 
AATTGCCGCT 
AACCGATTCC 
CGAGCGCCAG 
CCCACAATAC 
ACACCAGCGC 
GCAAAGCTCC 
CGCTGGCACC 
CGTTAATCAG 
CCGGAGTGGT 
CAAAGAGCGC 
ATGTTGAACT 
GGACCACAGC 
ATCACCGCCT 
GCACTTTGCG 
GCGACACTGC 
CCCATGGTAA 
AACGAATTGC 
CGTGGATGGA 
GGATACTTCC 
ATGAAGAGGC 
TCGCCTCGGC 
AGATAGCGTG 
AGCAATGGAT 
AAATGGGAAT 
GGAGTGAACG 
AATCTAAAGG 
GTTATGCTGC 
CAATTCAAGG 



ATTGGCGCGG 
CCTTGTCGGG 
AAAGTTGAAT 
ACAAGTTCAG 
GCATCATGCC 
GATCATCAGC 
GATACTACGC 
CTGCATAAAC 
C AG C AAACAG 
CGGCAATTGC 
CAGCATtAAC 
TGCGGCTTGC 
AATCTCAATA 
ACCGAAGTAA 
TGTTGTCTTG 
AACAACGGCA 
AACCAGGGCG 
CCCATCACCA 
GAATGATGCC 
GGTAAAGCTC 
GACGTTCGCT 
CCACCGGCAG 
TTGCAGG CAT 
GCGCTGGCGC 
CGTCCGCCAG 
CGGTTACCGC 
GGCGGGCGCT 
GTCCGGCCAG 
CACTGAAAAT 
TGACGTTCCT 
CCGTTTCCAT 
GTGGTTAACT 
TTTTAAGACC 
CCGAATAAGA 



GTAGTATCGT 
TTATTCGCGA 
CGTCAGTTTA 
CGATCTACAT 
TCTTTGACCC 
CCAACGACGC 
AGCACGCCAG 
GCCACCAGCT 
AGCGGAAACG 
ATCGGCAGCC 
AGTTTGCGCC 
CCAAGCGTCA 
TAGAAAGCGG 
ACACCCAGCG 
TGGGAAGAGG 
GGCAGCGCCA 
TTATGGCGGC 
GTGGCGTG CG 
GATCCCCACC 
ACGCATCAAT 
GACATGCTGA 
AG CGGTCGAC 
CTCATGAGTG 
CTGCTGCGCG 
GGGGACATGC 
CTGTTG AC CT 
CTGCTCATGG 
GACGACGAGG 
GGTTCAATGA 
TCCAAACATT 
TTAGGTGGGT 
CGACATCTTG 
CACTTTCACA 
AGGCTGGCTC 



5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 

6180 

6240 

6300 

6360 

6420 

6480 

6540 

6600 

6660 

6720 

6780 

6840 

6900 

6960 

7020 



-47- 



SUBSTITUTE SHEET (RULE 26) 



WO 98/10077 PCT/US97/15941 

5 TGCACCTTGG TGATCAAATA ATTCGATAGC TTGTCGTAAT AATGGCGGCA TACTATCAGT 7080 

AGTAGGTGTT TCCCTTTCTT CTTTAGCGAC TTGATGCTCT TGATCTTCCA ATACGCAACC 7140 

TAAAGTAAAA TGCCCCACAG CGCTGAGTGC ATATAATGCA TTCTCTAGTG AAAAACCTTG 7200 

TTGGCATAAA AAGGCTAATT GATTTTCGAG AGTTTCATAC TGTTTTTCTG TAGGCCGTGT 7260 

ACCTAAATGT ACTTTTGCTC CATCGCGATG ACTTAGTAAA GCACATCTAA AACTTTTAGC 7320 

.0 GTTATTACGT AAAAAATCTT GCCAGCTTTC CCCTTCTAAA GGGCAAAAGT GAGTATGGTG 7380 

CCTATCTAAC ATCTCAATGG CTAAGGCGTC GAGCAAAGCC CGCTTATTTT TTACATGCCA 7440 

ATACAATGTA GGCTGCTCTA CACCTAGCTT CTGGGCGAGT TTACGGGTTG TTAAACCTTC 7500 

GATTCCGACC TCATTAAGCA GCTCTAATGC GCTGTTAATC ACTTTACTTT TATCTAATCT 7560 

AGACATCATT AATTCCTAAT TTTTGTTGAC ACTCTATCAT TGATAGAGTT ATTTTACCAC 7620 

L5 TCCCTATCAG TGATAGAGAA AAGTGAAATG AATAGTTCGA CAAAGATCGC ATTGGTAATT 7680 

ACGTTACTCG ATGCCATGGG GATTGGCCTT ATCATGCCAG TCTTGCCAAC GTTATTACGT 7740 

GAATTTATTG CTTCGGAAGA TATCGCTAAC CACTTTGGCG TATTGCTTGC ACTTTATGCG 7800 

TTAATGCAGG TTATCTTTGC TCCTTGGCTT GGAAAAATGT CTGACCGATT TGGTCGGCGC 7860 

CCAGTGCTGT TGTTGTCATT AATAGGCGCA TCGCTGGATT ACTTATTGCT GGCTTTTTCA 7920 

20 AGTGCGCTTT GGATGCTGTA TTTAGGCCGT TTGCTTTCAG GGATCACAGG AGCTACTGGG 7980 

GCTGTCGCGG CATCGGTCAT TGCCGATACC ACCTCAGCTT CTCAACGCGT GAAGTGGTTC 8040 

GGTTGGTTAG GGGCAAGTTT TGGGCTTGGT TTAATAGCGG GGCCTATTAT TGGTGGTTTT 8100 
GCAGGAGAGA TTTCACCGCA TAGTCCCTTT TTTATCGCTG CGTTGCTAAA TATTGTCACT 8160 
TTCCTTGTGG TTATGTTTTG GTTCCGTGAA ACCAAAAATA CACGTGATAA TACAGATACC 8220 
25 GAAGTAGGGG TTGAGACGCA ATCGAATTCG GTATACATCA CTTTATTTAA AACGATGCCC 8280 
ATTTTGTTGA TTATTTATTT TTCAGCGCAA TTGATAGGCC AAATTCCCGC AACGGTGTGG 8340 
GTGCTATTTA CCGAAAATCG TTTTGGATGG AATAGCATGA TGGTTGGCTT TTCATTAGCG 84 00 

GGTCTTGGTC TTTTACACTC AGTATTCCAA GCCTTTGTGG CAGGAAGAAT AGCCACTAAA 8460 
TGGGGCGAAA AAACGGCAGT ACTGCTCGAA TTTATTGCAG ATAGTAGTGC ATTTGCCTTT 8520 
3 0 TTAGCGTTTA TATCTGAAGG TTGGTTAGAT TTCCCTGTTT TAATTTTATT GGCTGGTGGT 8580 

GGGATCGCTT TACCTGCATT ACAGGGAGTG ATGTCTATCC AAACAAAGAG TCATGAGCAA 8640 
GGTGCTTTAC AGGGATTATT GGTGAGCCTT ACCAATGCAA CCGGTGTTAT TGGCCCATTA 8700 
CTGTTTACTG TTATTTATAA TCATTCACTA CCAATTTGGG ATGGCTGGAT TTGGATTATT 8760 
GGTTTAGCGT TTTACTGTAT TATTATCCTG CTATCGATGA CCTTCATGTT AACCCCTCAA 8 820 

35 GCTCAGGGGA GTAAACAGGA GACAAGTGCT TAGTTATTTC GTCACCAAAT GATGTTATTC 8880 

CGCGAAATAT AATGACCCTC TTGATAACCC AAGAGGGCAT TTTTTACGAT AAAGAAGATT 8940 
TAGCTTCAAA TAAAACCTAT CTATTTTATT TATCTTTCAA GCTCAATAAA AAGCCGCGGT 9000 
AAATAGCAAT AAATTGGCCT TTTTTATCGG CAAGCTCTTT TAGGTTTTTC GCATGTATTG 9060 
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5 CG ATATG CAT AAACCAGCCA 

GTTGGTTCTC TTGGATCAAT 
GGCTAATTTT TTCAAGTTCA 
GGT T T T TATT ATTATCAATA 
ACCAACCATT TGTTAAATCA 
10 GCTTATAACA GGCACTGAGT 

TGGTC AC CAA CGCTTTTCCC 
GGTCCGGACC GCCGCCCGAT 
TGG CCGCTGA GCACGCGGCA 
CGCCGCGCGA GGTGTGCCCG 
15 GCGACCGCGG CTGGCTGGCC 

ATG CCAACAT CTTCACGAAT 
CGGGCAGGCT GGAGGCTCGA 
GGCTTCTTCG CTGGATCATT 
GCGACGGCGA GGGCGAGGGC 
2 0 TGCTTGACTA GCGCGGTCAC 

GCGTGATCCG CTGGAAGTCG 
CGGCATCGTG GTGTGCGTGG 
GGGCCGCCGC TATGACGCCC 
GCCCATCGAG CGGCAGGCCC 
25 CGGTGGCTCG GGCTTGGGCG 

GCGCGCGGAC TTCCTGGCCG 
CACCGGAATC TGCTGGGCAG 
CCGCCGATAC CGGCCTGGAG 
ACCGGCGCCC CGTCATGCTC 
30 CCAGCCTCGT GCGGTGGAAG 

TGCGCGGTGG CGGCGTGTCT 
GATCAGATCT TGATCCCCTG 
CTTTGCAGGG CTTCCCAACC 
CTGTC CATAA AACCGCCCAG 
3 5 TTCTCTTTGC GCTTGCGTTT 

GTCAG CACCG TTTCTGCGGA 
GCCCTGAGTG CTTGCGGCAG 
AACGAGAGGA TCGAGACCAT 



TTGAGTAAGT 
TTG CTGACAA 
TTCCAACCAA 
ATATAATCAA 
GTTTTTGTTG 
AATTGTTTTT 
GAGATCCTCT 
CTCCATCCGC 
CTTGCGCCCG 
CTCCACGGCG 
ATCGACCCGC 
CCCGATCTCA 
CTCAGCATTG 
GCATGGACGG 
GCTGCGATTG 
CGATCTCACC 
TTGCGGGCCA 
CCGAGGGACT 
AGCGTCTTGG 
GCGTGATCGG 
ACCTGGGCTT 
AACAGGGACT 
CAGCGGGCTC 
CATCGCCCCG 
GCCAGCGGGC 
CCCATCGAAC 
TGGGAGATTG 
CGCCATCAGA 
TTCCCAGAGG 
TCTAGCTATC 
TCCCTTGTCC 
CTGGCTTTCT 
CGTGAAGCTT 
CCGCTCCAGA 



TTTTAAGCAC 
TGGCGTTTAC 
TGATAGGCAT 
GATAATGTTC 
TGATGTAGGC 
TATTTTTAAA 
GCGACACCGC 
TACAGGAATG 
CCGCCAGCGT 
ACCTGCACCA 
ACGGACTGCT 
GCGACCCCGG 
TGGTCGCGAC 
GCTTGTCGGC 
ATCTGGCCGT 
TGGTCGTCGA 
CACCCGCCGC 
ATGGAAGGTG 
TGGCGTGACG 
TGCCACCTGG 
TAGCAGTGAG 
GGCCGAGCGG 
GGG AACTGGC 
TGGCCGACGG 
GAAATGGGAT 
AGCGGCTTGG 
GACGACAGCG 
TCCTTGGCGG 
GCGCCCCAGC 
GCCATGTAAG 
AGATAGCCCA 
ACGTGTTCCG 
TCTCTGAGCT 
TTATCCGGCT 
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ATCACTATCA 
CTTACCAGTA 
CACTTCTTGG 
AAATATACTT 
ATCAATCATA 
GTGATGATAA 
CGCTCGTCTG 
GTTCCAGCCG 
AGCGCGCCAA 
CGAGAACGTG 
CGGCGAGCGC 
TCGCCCGCTT 
GACCGGGTTT 
AGCCTGGTTC 
AAACGC CATG 
GCTAGGTCAG 
CTCGAAGCCC 
CCGGACGATC 
GTGGAGCTGA 
CTTGAC CAGC 
GCCAAGTAGG 
CGCGGGCAGC 
GCAGGCCGCG 
CCAGCGCGTT 
GCTTGATGAC 
GGAGCAGCTC 
TGGGCCGGCC 
CAAGAAAGCC 
TGGCAATTCC 
CCCACTGCAA 
GTAGCTGACA 
CTTCCTTTAG 
GTAACAGCCT 
CCTCCATGCG 



PCT/US97/15941 

TAAGCTTTAA 912 0 

ATGTATTCAA 918 0 

ATAGGGATAA 9240 

TCTAAGGCAG 93 00 

ATTAATTGCT 9360 

AAGGCACCTT 9420 

CACGCGCCGC 94 80 

CTTTTCCGGT 9540 

CTTCTGGCGG 9600 

CTCGACTTCG 9660 

ACCTTCGACT 9720 

GCGATCCTGC 9780 

GAGCCCGAAC 9840 

ATCGGCGACG 9900 

GCACGCCGGT 9960 

GCCGTGTCGG 10020 

TGCAC CAGGC 10080 

TGCCCGAGCA 1014 0 

AATCG CACCT 10200 

AGTTG ATCGA 10260 

CGATACAGCA 10320 

GCGTGATCCT 10380 

AAGG AC ATTG 1044 0 

GCCGGCGTCT 10500 

GCCAAGGGGT 10560 

GCCGCGACGG 10620 

CCTGTCTCTT 10680 

ATCCAGTTTA 10740 

GGTTCGCTTG 10800 

GCTACCTGCT 1086 0 

TTCATCCGGG 1092 0 

CAGCCCTTGC 10980 

GACCGCAACA 11040 

TTG CCTCTCG 11100 
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5 GCTCCTGCTC CGGTTTTCCA TGCCTTATGG AACTCCTCGA TCCGCCAGCG ATGGGTATAA 11160 

ATGTCGATGA CGCGCAAGGC TTGGGCTAGC GACTCGACCG GTTCGCCGGT CAGCAACAAC 
CATTTCAACG GGGTCTCACC CTTGGGCGGG TTAATCTCCT CGGCCAGCAC CGCGTTGAGC 
GTGATATTCC CCTGTTTTAG CGTGATGCGC CCACTGCGCA GGCTCAAGCT CGCCTTGCGG 
GCTGGTCGAT TTTTACGTTT ACCGCGTTTA TCCACCACGC CCTTTTGCGG AATGCTGATC 
10 TGATAGCCAC CCAACTCCGG TTGGTTCTTC AGATGGTCGA TCAGATACAA CCCAGACTCT 

ACGTCCTTGC GTGGGTGCTT GGAGCGCACC ACGAAGCGCT CGTTATGCGC CAGCCTGTCC 
TGCAGATAAG CATGAATATC GGCTTCGCGG TCACAGACCG CAATCACGTT GCTCATCATG 
CTGCCCATGC GTAACCGGCT AGTTGCGGCC GCTGCCAGCC ATTTGCCACT CTCCTTTTCA 
TCCGCATCGG CAGGGTCATC CGGGCGCATC CACCACTCCT GATGCAGTAA TCCTACGGTG 
15 CGGAATGTGG TGGCCTCGAG CAAGAGAACG GAGTGAACCC ACCATCCGCG GGATTTATCC 

TGAATAGAGC CCAGCTTGCC AAGCTCTTCG GCGACCTGGT GGCGATAACT CAAAGAGGTG 
GTGTCCTCAA TGGCCAGCAG TTCGGGAAAC TCCTGAGCCA ACTTGACTGT TTGCATGGCG 
CCAGCCTTTC TGATCGCCTC GGCAGAAACG TTGGGATTGC GGATAAATCG GTAAGCGCCT 
TCCTGCATGG CTTCACTACC CTCTGATGAG ATGGTTATTG ATTTACCAGA ATATTTTGCC 
20 AATTGGGCGG CGACGTTAAC CAAGCGGGCA GTACGGCGAG GATCACCCAG CGCCGCCGAA 

GAGAACACAG ATTTAGCCCA GTCGGCCGCA CGATGAAGAG CAGAAGTTAT CATGAACGTT 
ACCATGTTAG GAGGTCACAT GGAAGATCAG ATCCTGGAAA ACGGGAAAGG TTCCGTTCGA 
ATTGCATGCG GATCCGGGAT CAAGATCTGA TCAAGAGACA GGTACCAATT GTTGAAGACG 
AAAGGGCCTC GTGATACGCC TATTTTTATA GGTTAATGTC ATGATAATAA TGGTTTCTTA 
25 GACGTCAGGT GGCACTTTTC GGGGAAATGT GCGCGGAACC CCTATTTGTT TATTTTTCTA 

AATACATTCA AATATGTATC CGCTCATGAG ACAATAACCC TGATAAATGC TTCAATAATA 
TTGAAAAAGG AAGAGTATGA GTATTCAACA TTTCCGTGTC GCCCTTATTC CCTTTTTTGC 
GGCATTTTGC CTTCCTGTTT TTGCTCACCC AGAAACGCTG GTGAAAGTAA AAGATGCTGA 
AGATCAGTTG GGTG CACGAG TGGGTTACAT CGAACTGGAT CTCAACAGCG GTAAGATCCT 
30 TGAGAGTTTT CGCCCCGAAG AACGTTTTCC AATGATGAGC ACTTTTAAAG TTCTGCTATG 

TGGCGCGGTA TTATCC CGTG TTGACGCCGG GCAAGAGCAA CTCGGTCGCC G C ATAC ACTA 
TTCTCAGAAT GACTTGGTTG AGTACTTGGC AAACTGATCT AAATGTTTAG CCCAGTCATC 
ATACTTCACC GATGCCAACG CATTAAAAAT AGCATCACGA TCGGCTTTGC TGAATTTCTT 
ATTTAAAACA TCCTTGTATT TTTCAAAAGC AGCGAGAGCT TCATTCACAT TGCCGATTTT 
3 5 CTTACCTTTA GACTTATCAG CAAGTTCCTG TGCCATTTTC GAATATTTTT CACCATATTT 

TTCAGTCAGC GTTTGATAAA AGCTAACTGT TGCATCAACA GCATCCTTAA TCTGTGAATT 
AAGGAGATTA TTCTGTGCTT TTTTCAAATT TTCTTCAGCT TCATGAACAC GAGCGATACC 
GGCATTACGA TTATTACTGA CCTGAGAAAT AGCCTTCTGG ATCTGAGTTA TATCAGCATT 
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5 TATCCGGTTA 
ACCGGCCCCA 
GGCTCTTGGT 
AGCCAGTTCA 
GGAGCTTAAA 
10 TGCAGCAGAA 
CAGTCGTTTC 
TTCAATCTCT 
TTTACGGGCT 
TGCATTATTA 
15 AGCCTCATTC 
CGCCTGTGCT 
TTGAGCAGTA 
TCCTTTTCCA 
AATTACCTGT 
2 0 CATTATAAAA 
ACCACTGGTT 
GGACTTCCAT 
GCCCGATGCC 
GAAACTATGA 

2 5 GGACGCACTA 

CCTTGTCAGT 

CCGGGCATAC 
AGAAAATCCA 
ATGCCTGACG 

3 0 AGCAGAACGG 

AAAGAAACGG 
TAAAATAGTC 
ATGAGACGTA 
ACAGCCTGTA 
3 5 GATGACGGTT 
CCTTTTTTTC 
AGTGATGTGC 
GGCACAACTC 



ATACGTGTTT 
ACCCGTCGTC 
GATAGTTTTT 
TTTCGTTTTC 
CGAGAATTGA 
AGTTTTTTTT 
TCTTCAGCTT 
TTACGTCGTT 
TTTTCTTCTG 
GCATGAGCAA 
ACGATATCCT 
TCCGCTGCAG 
GACCATTTAG 
CCTCCGCCGC 
CCCTTATCAT 
TCCTCTTTGA 
TTATATACAG 
TTTTGTGAAA 
ACAAAAACCA 
GATGAGACAT 
AAAATGACAG 
TACCGTACCG 
GGCGAATTAA 
CATGATCAGC 
CTGATGCTTG 
GAACAGCTAC 
GGATTCTGGT 
TTCGGATAAT 
CCGGAAACAA 
CCCCAAAGGG 
TTGATCCCAA 
GGCAGTTCTG 
AACAACGCAT 
AATTTGCGGG 



CTGATGCTGT 
TGGTTGCTTC 
TGACCAGCTC 
CAGCGAGCGT 
GAGTCTTAAT 
GGGCGATCTC 
CAGCCAGTTT 
GTTCTGCTTC 
CTTTCGCAAG 
GCTCTGTTGC 
TCAGGCGCTG 
CTTTTGCCCG 
CAGTTGCATG 
CAGAGCCACT 
CATAAGGAAC 
CTTTT AAAAC 
CATAAAAGCT 
ACGATCAAAA 
GCACAAACAT 
CTATGGGACA 
GCAGATCGCG 
GCAGGGACGG 
AGCAGAATGA 
AGACAGAACG 
AGGATAAACA 
AAAATGAGAT 
CCAGGTTGTT 
AACTCACCGA 
ACTTTGTCTT 
CAGCGTGGAA 
CTTTTCCACC 
GGATATGGGA 
TCAGCAGTTT 
TACTGATTAC 



TACCTGTTTT 
AAAAAAAGGA 
ATCCAGTTCT 
TTTCATTTCT 
CTCTCCATCC 
AACAGCTTTA 
CAACTGGCGT 
CTGAAAAGCC 
GCGCAAACGC 
TGAAGGCGTA 
AGTCAGCGCA 
GGCAGCCTGC 
AATAGCTGCA 
CCCGTCAGGA 
ACCATCTTTA 
AATAAGTTAA 
ACGCCGCTGC 
AAACAGTCTT 
TACCGTTCTC 
CTGTCACTTT 
TTCACAGTTT 
ACGACGGGAG 
GACACCAGAA 
CATTCTCCGG 
GGCACAGGAT 
AGCCCAGCTC 
CGGTCGCTGA 
GAATAAATAC 
ATCGC CATG A 
CAACATACCC 
CAAAAAGCCG 
GCTAAAGACA 
CACAGCCAAG 
CGCAGCAAAG 



TGTTTTTCTT 
CGGTTCTGAA 
TTATATTTAG 
GCATCACGGG 
ATTTTCACCA 
GCTTCTTCAC 
TCTGTTTCAG 
TTTTCTGCTG 
TCTGCTTCCG 
CGTGAGGCAT 
TCCCTGTTTG 
TCTGCCTGTG 
GAACTTTCAC 
GTACCATTCA 
TAGTACGCTA 
AAATAAATAC 
ATTTTCCCTG 
TCACACCACG 
AGACCTCATT 
ATGGCATGGC 
TACCGTGATA 
TTTGAAACCA 
AGGCACAGTG 
GAACTGAATG 
ATGGATCGCA 
AGGCAGGCAC 
ACGCTGTCAG 
TTTAAGGTAG 
TAACAGCAAC 
GGCATTACGT 
ACACAACACG 
AAG CGACAGG 
AATTTTTAAA 
ACCTTACCCC 



CTCTAATCTT 
GCGGATCATT 
CGGATG CCTG 
CATGGATACT 
CTTCAGATTG 
TCAATGCAGC 
CCTTCTCCCG 
CTTCCGCTTC 
CCTGCATAGC 
TGTGACGAAG 
CCTTTGCTTT 
TTTTCTTTAA 
TTTTACTGCC 
AAAGAGTAAT 
CCGCGGTTTC 
TGTACATATA 
TCAAGACTGT 
CGCTATTCTC 
ATGTTTTACT 
ACACACTCCG 
TGCGCGGAGG 
GTGAACTGAT 
AGGGACATGC 
AGCTGAAACA 
GACGCCAGGA 
TGGAACTGGA 
AGACTGATGA 
GGAGACACTC 
AGTAG CTCTC 
ATATGCTTCT 
AATGATGGTG 
AAAATCACGG 
CTCACTCCGG 
GAAAAAATCC 
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5 AGGCTG CTGG CTGACACGAT TTCTGCGGTT TATCTCGATG GCTACGAGGG CAGACAGTAA 

GTGGATTTAC CATAATCCCT TAATTGTACG CACCGCTAAA ACGCGTTCAG CGCGATCACG 
GCAGCAGACA GGTAAAAATG GCAACAAACC ACCCTAAAAA CTGCGCGATC GCGCCTGATA 
AATTTTAACC GTATGAATAC CTATGCAACC AGAGGGTACA GGCCACATTA CCCCCACTTA 
ATCCACTGAA GCTGCCATTT TTCATGGTTT CACCATCCCA GCGAAGGGCC ATCCAGCGTG 
10 CGTTCCTGTA TTTC CGGCTG ACGCTCCCGT TCTAGGGATA ACACATGTTC GCGCTCCTGT 

ATCAGCCGTT CCTCTCTTAT CTCCAGTTCT CGCTGTATAA CTGG CTCAAG CGTTCTGTCT 
GCTCGCTCAA GTGTTGCACC TGCTGACTCA ACTGCATGAC CCGCTCGTTC AGCATCGCGT 
TGTCCCGTTG CGTAAGCGAA AACATCTTCT GCAATTCCAC GAAGGCGCTC TCCCATTCGC 
TCAGCCGCTG CATATAGTCC TGTTGCAGCT GCTCTAAGGC GTTCAGCAAA TGTGTTTCCA 
15 GCTCTGTCAC TCTGTGTCAC TCCTTCAGAT GTACCCACTC TTTCCCCTGA AAGGGAATCA 

CCTCCGCTGA TTTCCCGTAC GGAAGGACAA GGAATTTCCT GTTCCCGTCC TGCACAAACT 
CCACGCCCCA TGTCTTCGCG TTCAGTTTCT GCAATGTCTC TTCCTGCTTC CTGATTTCTT 
CCAGGTTCGC CTGTATCCTC CCTCCAAGAT ACCAGAGCGT CCCGCCACTC GCGGTAAACA 
GGAGAAAGAC TATCCC CAGT AACATCATGC CCGTATTCCC TGCCAGCTTT AACACGTCCC 
20 TCCTGTGCTG CATCATCGCC TCTTTCACCC CTTCCCGGTG TTTTTCCAGC GATTCCTCTG 

TCGAGG CTGT GAACAGGGCT ATAG CGTCTC TGATTTTCGT CTCGTTTGAT GTCACAGCCT 
CGCTTACAGA TTCGCCGAGC CTCCTGAACT CGTTGTTCAG CATTTTCTCT GTAGATTCGG 
CTCTCTCTTT CAGCTTTTTC TCGAACTCCG CGCCCGTCTG CAAAAGATTG CTCATAAAAT 
GCTCCTTTCA GCCTGATATT CTTCCCGCCG TTCGGATCTG CAATGCTGAT ACTGCTTCGC 
25 GTCACCCTGA CCACTTCCAG CCCCGCCTCA GTGAGCGCCT GAATCACATC CTGACGGCCT 

TTTATCTCTC CGGCATGGTA AAGTGCATCT ATACCTCGCG TGACGCCCTC AGCAAGCGCC 
TGTTTCGTTT CAGGCAGGTT ATCAGGGAGT GTCAGCGTCC TGCGGTTCTC CGGGGCGTTC 
GGGTCATGCA GCCCGTAATG GTGATTTAAC AGCGTCTGCC AAG C ATCAAT TCTAGG CCTG 
TCTGCGCGGT CGTAGTACGG CTGGAGGCGT TTTCCGGTCT GTAGCTCCAT GTTCGGAATG 
3 0 ACAAAATTCA GCTCAAGCCG TCCCTTGTCC TGGTGCTCCA CCCACAGGAT GCTGTACTGA 

TTTTTTTCGA GACCGGGCAT CAGTACACGC TCAAAGCTCG CCATCACTTT TTCACGTCCT 
CCCGGCGGCA GCTCCTTCTC CGCGAACGAC AGAACACCGG ACGTGTATTT CTTCGCAAAT 
GGCGTGGCAT CGATGAGTTC CCGGACTTCT TCCGGTATAC CCTGAAGCAC CGTTGCGCCT 
TCGCGGTTAC GCTCCCTCCC CAGCAGGTAA TCAACCGGAC CACTGCCACC ACCTTTTCCC 
3 5 CTGGCATGAA ATTTAACTAT CATCCCGCGC CCCCTGTTCC CTGACAGCCA GACGCAGCCG 

GCGCAGCTCA TCCCCGATGG CCATCAGTGC GGCCACCACC TGAACCCGGT CACCGGAAGA 
CCACTGCCCG CTGTTCACCT TACGGGCTGT CTGATTCAGG TTATTTCCGA TGGCGGCCAG 
CTGACGCAGT AACGGCGGTG CCAGTGTCGG CAGTTTTCCG GAACGGGCAA CCGGCTCCCC 
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CAGGCAGACC CGCCGCATCC ATACCGCCAG TTGTTTACCC TCACAGCGTT CAAGTAACCG 1726 0 

GGCATGTTCA TCATCAGTAA CCCGTATTGT GAGCATCCTC TCGCGTTTCA TCGGTATCAT 1734 0 

TACCCCATGA ACAGAAATCC CCCTTACACG G AGG CATC AG TGACTAAACA GGAAAAAACC 17400 

GCCCTTAACA TGGCCCGCTT TATCAGAAGC CAGACATTAA CGCTG CTGGA GAAGCTCAAC 17460 

GAACTGGACG CAGATGAACA GGCCGATATT TGTGAATCGC TTCACGACCA CGCCGATGAG 1752 0 

CTTTACCGCA GCTGCCTCGC ACGTTTCGGG GATGACGGTG AAAACCTCTG ACACATG CAG 17580 

CTCCCGGAGA CGGTC AC AG C TTGTCTGTGA GCGGATGCCG GGAGCTGACA AGCCCGTCAG 1764 0 

GGCGCGTCAG CAGGTTTTAG CGGGTGTCGG GGCGCAGCCC TGACCCAGTC ACGTAG CG AT 17700 

AGCGGAGTGT AT ACTGG CTT AACCATGCGG CATCAGTGCG GATTGTATGA AAAGTACGCC 1776 0 

ATG CCGGGTG TGAAATGCCG CACAGATGCG TAAGGAGAAA ATGCACGTCC AGGCG CTTTT 17820 

CCGCTTCCTC GCTCACTGAC TCGCTACGCT CGGTCGTTCG ACTGCGGCGA GCGGTACTGA 17880 

CTCACACAAA AACGGTAACA CAGTTATCCA CAGAATCAGG GGATAAGGCC GGAAAGAACA 17940 

TGTGAGCAAA AGACCAGGAA CAGGAAGAAG GCCACGTAGC AGGCGTTTTT CCATAGGCTC 18000 

CGCCCCCCTG ACG AG CATC A CAAAAATAGA CGCTCAAGTC AGAGGTGG CG AAACCCGACA 18060 

GGACTATAAA GCTACCAGGC GTTTCCCCCT GGAAGCTCCC TCGTGCGCTC TCCTGTTCCG 18120 

ACCCTGCCGC TTACCGGATA CCTGTCCGCC TTTCTCCCTT CGGGAAGCGT GGCGCTTTCT 18180 

CATAGCTCAC GCTGTTGGTA TCTCAGTTCG GTGTAGGTCG TTCGCTCCAA GCTGGGCTGT 1824 0 

GTGCACGAAC CCCCCGTTCA GCCCGACCGC TGCGCCTTAT CCGGTAACTA TCGTCTTGAG 18300 

TCCAACCCGG TAAGGCACGC CTTAACGCCA CTGGCAGCAG CCACTGGTAA CCGGATTAGC 1836 0 

AG AG CG ATG A TGGCACAAAC GGTGCTACAG AGTTCTTGAA GTAGTGGCCC GACTACGGCT 18420 

ACACTAGAAG GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC TTCGGAAAAA 18480 

GAGTTGGTAG CTCTTGATCC GG C AAAC AAA CCACCGTTGG TAGCGGTGGT TTTTTTGTTT 18 540 

GCAAGCAGCA GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTA ATCTTTTCTA 18600 

CTGAACCGCG ATCCCCGTCA GTTTAGAAGA GGAGGATGGT GCGATGGTCC CTCCCTGAAC 18660 
ATCAGGTATA TAGTTAGCCT GACATCCAAC AAGGAGGTTT ATCGCGAATA TTCCCACAAA 18720 
AAATCTTTTC CTCATAACTC GATCCTTATA AAATGAAAAG AATATATGGC GAGGTTTAAT 18780 
TTATG AG CTT AAGATACTAC ATAAAAAATA TTTTATTTGG CCTGTACTGC ACACTTATAT 18840 
ATATATACCT TATAACAAAA AACAGCGAAG GGTATTATTT CCTTGTGTCA GATAAGATGC 18 900 
TATATGCAAT AGTGATAAGC ACTATTCTAT GTCCATATTC AAAATATGCT ATTGAATACA 18960 
TAGCTTTTAA CTTCATAAAG AAAGATTTTT TCGAAAGAAG AAAAAACCTA AATAACGCCC 1902 0 
CCGTAGCAAA ATTAAACCTA TTTATGCTAT ATAATCTACT TTGTTTGGTC CTAGCAATC C 19080 
CATTTGGATT GCTAGGACTT TTTATATCAA TAAAGAATAA TTAAATCCCT AACACCTCAT 19140 
TTATAGTATT AAGTTTATTC TTATCAATAT AGGAGCATAG AA 19182 
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1. A system for transposing a transposable DNA sequence 
in vitro, the system comprising: 

a Tn5 transposase modified relative to a wild type Tn5 
transposase, the modified transposase comprising a change 
relative to the wild type Tn5 transposase that causes the 
modified transposase to bind to Tn5 outside end repeat 
sequences with greater avidity than the wild type Tn5 
transposase, and a change relative to the wild type Tn5 
transposase that causes the modified transposase to be less 
likely than the wild type transposase to assume an inactive 

multimeric form; 

a donor DNA molecule comprising the transposable DNA 
sequence, the DNA sequence being flanked at its 5'- and 3 ■ -ends 
by the TnS outside end repeat sequences; and 

a target DNA molecule into which the transposable element 

can transpose . 

2. A system as claimed in Claim 1 wherein the change that 
causes the modified transposase to bind with greater avidity is 
characterized as a substitution mutation at position 54 of the 
wild type transposase. 

3. A system as claimed in Claim 2 wherein position 54 is 
a lysine. 

4. A system as claimed in Claim 1 wherein the change that 
causes the modified transposase to be less likely to assume an 
inactive multimeric form is characterized as a substitution 
mutation at position 372 of the wild type transposase. 



5. A system as claimed in Claim 4 wherein position 372 
is a proline. 

6. A system as claimed in Claim 1 wherein the modified 
transposase further comprises a substitution mutation at 
position 56 of the wild type transposase. 
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7. A system as claimed in Claim 6 wherein position 56 is 
an alanine. 

8. A system as claimed in Claim 1 wherein the donor DNA 
molecule is flanked at its 5'- and 3 • -ends by an 18 or 19 base 
pair flanking DNA sequence comprising nucleotide A at position 
10, nucleotide T at position 11, and nucleotide A at position 
12. 

9. The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 4 selected 
from the group consisting of A or T. 

10. The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 15 selected 
from the group consisting of G or C. 

11. The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 17 selected 
from the group consisting of A or T. 

12. The system as claimed in Claim 8 wherein the flanking 
sequence further comprises a nucleotide at position 18 selected 
from the group consisting of G or C. 

13. The system as claimed in Claim 8 wherein the flanking 
sequence has the sequence 5 ' -CTGTCTCTTATACACATCT- 3 ' . 

14. The system as claimed in Claim 8 wherein the flanking 
sequence has the sequence 5 • -CTGTCTCTTATACACATCT- 3 ■ . 
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15. A Tn5 transposase modified relative to a wild type 
Tn5 transposase, the modified transposase comprising: 

a change relative to the wild type Tn5 transposase that 
causes the modified transposase to bind to Tn5 outside end 
repeat sequences of a donor DNA with greater avidity than the 
wild type Tn5 transposase; and 

a change relative to the wild type Tn5 transposase that 
causes the modified transposase to be less likely than the wild 
type transposase to assume an inactive multimeric form. 

16. A modified Tn5 transposase as claimed in Claim 15 
wherein the change that causes the modified transposase to bind 
with greater avidity is characterized as a substitution 
mutation at position 54 of the wild type transposase. 

17. A modified Tn5 transposase as claimed in Claim 16 
wherein position 54 is a lysine. 

18. A modified Tn5 transposase as claimed in Claim 15 
wherein the change that causes the modified transposase to be 
less likely to assume an inactive multimeric form is 
characterized as a substitution mutation at position 372 of the 
wild type transposase. 



19. A modified Tn5 transposase as claimed in Claim 18 
wherein position 372 is a proline. 

20. A modified Tn5 transposase as claimed in Claim 15 
further comprising a substitution mutation at position 56 of 
the wild type transposase. 

21. A modified Tn5 transposase as claimed in Claim 20 
wherein position 56 is alanine. 

22. A genetic construct comprising a nucleotide sequence 
that can encode a Tn5 transposase that both has greater avidity 
for Tn5 outside end repeats and is less likely to assume an 
inactive multimeric form than a wild type TnS transposase. 
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23. A genetic construct as claimed in Claim 22 comprising 
a nucleotide sequence that encodes a lysine residue at amino 
acid 54 of the transposase . 

24. A genetic construct as claimed in Claim 22 comprising 
a nucleotide sequence that encodes a proline residue at amino 
acid 372 of the transposase. 

25 A genetic construct as claimed in Claim 22 comprising 
a nucleotide sequence that encodes a lysine residue at amino 
acid 54 of the transposase and a proline residue at amino add 
372 of the transposase. 

26. A genetic construct as claimed in Claim 22 comprising 
the nucleotide sequence of SEQ ID NO:l. 

27 A genetic construct comprising: 

a transposable DNA sequence flanked at its 5 • and 3 • ends 
by an 18 or 19 base pair flanking DNA sequence comprising 
nucleotide A at position 10, nucleotide T at position 11, and 
nucleotide A at position 12. 

28 The construct of Claim 27 further comprising, at 
position 4 of the flanking sequence, a nucleotide selected from 
the group consisting of T or A. 

29. The construct of Claim 27 further comprising, at 
position 15 of the flanking sequence, a nucleotide selected 
from the group consisting of G or C. 

30. The construct of Claim 27 further comprising, at 
position 17 of the flanking sequence, a nucleotide selected 
from the group consisting of T or A. 

31. The construct of Claim 27 further comprising, at 
position 18 of the flanking sequence, a nucleotide selected 
from the group consisting of G or C. 



-57- 

SUBST1TUTE SHEET (RULE 26) 



BNSDOCID: <WO 9810077A1_I_> 



WO 98/10077 PCT/US97/15941 

32. The construct as claimed in Claim 27 wherein the 
flanking sequence has the sequence 5 ' -CTGTCTCTTATACACATCT-3 * 

33. The construct as claimed in Claim 27 wherein the 
flanking sequence has the sequence 5 1 -CTGTCTCTTATACAGATCT-3 1 . 

34. A method for in vitro transposition, the method 
comprising the steps of: 

combining a donor DNA molecule that comprises a 
transposable DNA sequence of interest, the DNA sequence of 
interest being flanked at its 5'- and 3 • -ends by Tn5 outside 
end repeat sequences, with a target DNA molecule and a Tn5 
transposase modified relative to wild type Tn5 transposase in a 
suitable reaction buffer at a temperature below a physiological 
temperature until the modified transposase binds to the outside 
end repeat sequences; and 

raising the temperature to a physiological temperature for 
a period of time sufficient for the enzyme to catalyze in vitro 

transposition, 

wherein the modified transposase comprises a change 
relative to the wild type Tn5 transposase that causes the 
modified transposase to bind to the Tn5 outside end repeat 
sequences with greater avidity than the wild type Tn5 
transposase, and a change relative to the wild type Tn5 
transposase that causes the modified transposase to be less 
likely than the wild type transposase to assume an inactive 
multimeric form. 

35. A method as claimed in Claim 34 wherein the change 
that causes the modified transposase to bind with greater 
avidity is characterized as a substitution mutation at position 
54 of the wild type transposase. 

36. A method as claimed in Claim 35 wherein position 54 
is a lysine. 
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37. A method as claimed in Claim 34 wherein the change 
that causes the modified transposase to be less likely to 
assume an inactive multimeric form is characterized as a 
substitution mutation at position 372 of the wild type 
transposase . 

38. A method as claimed in Claim 37 wherein position 372 
is a proline. 

39. A method as claimed in Claim 34 wherein the modified 
transposase further comprises a substitution mutation at 
position 56 of the wild type transposase. 

40. A method as claimed in Claim 39 wherein position 56 
is an alanine. 

41. A method as claimed in Claim 34 wherein the DNA 
sequence of interest is flanked at its 5'- and 3 - -ends by an 18 
or 19 base pair flanking DNA sequence comprising nucleotide A 
at position 10, nucleotide T at position 11, and nucleotide A 
at position 12 . 

42. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 4 
selected from the group consisting of A or T. 

43. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 15 
selected from the group consisting of G or C. 

44. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 17 
selected from the group consisting of A or T . 

45. The method as claimed in Claim 41 wherein the 
flanking sequence further comprises a nucleotide at position 18 
selected from the group consisting of G or C. 
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46 The method as claimed in Claim 41 wherein the 
flanking sequence has the sequence 5 ' - CTGTCTCTTATACACATCT - 3 ' . 

47 The method as claimed in Claim 41 wherein the 
flanking sequence has the sequence 5 1 - CTGTCTCTTATACAGATCT - 3 ' . 
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